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ABSTRACT 

Aims. The scope of this paper is twofold. First, it describes the simulation scenarios and the results of a large-scale, double-blind test 
(-H ' campaign carried out to estimate the potential of Gaia for detecting and measuring planetary systems. The identified capabilities are 

then put in context by highlighting the unique contribution that the Gaia exoplanet discoveries will be able to bring to the science of 
I ■ extrasolar planets in the next decade. 

O ' Methods. We use detailed simulations of the Gaia observations of synthetic planetary systems and develop and utilize independent 

software codes in double-blind mode to analyze the data, including statistical tools for planet detection and different algorithms for 
C/3 , single and multiple Keplerian orbit fitting that use no a priori knowledge of the true orbital parameters of the systems. 

. Results. 1) Planets with astrometric signatures a ^ 3 times the assumed single-measurement error (Tip and period P < 5 yr can 

be detected reliably and consistently, with a very small number of false positives. 2) At twice the detection limit, uncertainties in 
orbital parameters and masses are typically 15% - 20%. 3) Over 70% of two-planet systems with well-separated periods in the range 
0.2 < P <9yi, astrometric signal-to-noise ratio 2 < a/cr^ < 50, and eccentricity e < 0.6 are correctly identified. 4) Favorable orbital 
configurations (both planets with P < 4 yr and a/cr^ > 10, redundancy over a factor of 2 in the number of observations) have orbital 
elements measured to better than 10% accuracy > 90% of the time, and the value of the mutual inclination angle 1^1 determined with 
j^-^ , uncertainties < 10 . 5) Finally, nominal uncertainties obtained from the fitting procedures are a good estimate of the actual errors 

. in the orbit reconstruction. Extrapolating from the present-day statistical properties of the exoplanet sample, the results imply that 

• ' a Gaia with cr^ = 8 /jas, in its unbiased and complete magnitude-limited census of planetary systems, will discover and measure 

(N ■ several thousands of giant planets out to 3-4 AUs from stars within 200 pc, and will characterize hundreds of multiple-planet systems, 

including meaningful coplanarity tests. Finally, we put Gala's planet discovery potential into context, identifying several areas of 
planetary-system science (statistical properties and correlations, comparisons with predictions from theoretical models of formation 
' and evolution, interpretation of direct detections) in which Gaia can be expected, on the basis of our results, to have a relevant impact, 

when combined with data coming from other ongoing and future planet search programs. 

Key words, planetary systems - astrometry - methods: data analysis - methods: numerical - methods: statistical - stars: statistics 

1 . Introduction solar planets have been found residing at li > 300 pc, thanks to 

both transit photometry (e.g., Konacki et al. 2003, 2005; Bouchy 

The continuously increasing catalog of extrasolar planets is to- ^1. 2004; Pont et al. 2004, 2007; Udalski et al. 2007; Collier 

day surpassing the threshold of 270 planets announced^. Most Cameron et al. 2006; Mandushev et al. 2007; Kovacs et al. 2007; 

of the nearby (d < 200 - 300 pc) exoplanet candidates have been ^^y.^^ 2007) as well as microlensing surveys in the Galactic 

detected around F-G-K-M dwarfs by long-term, high-precision ^ulge (e.g., Bond et al. 2004; Udalski et al. 2005; Gould et al. 

(1 - 5 m s ') Doppler search programs (e.g., Butler et al. 2006, 2006" Bealieu et al 2006) 
and references therein; Udry et al. 2007, and references therein). 

Over a dozen of these are 'hot Jupiters' with orbital periods The sample of nearby exoplanets and their host stars is 
P ^ I - 20 days discovered to cross the disk of their relatively amenable to follow-up studies with a variety of indirect and 
bright (y < 13) parent stars thanks to high-cadence, milli-mag direct techniques, such as high-resolution (visible-light and in- 
photometric measurements 3 An additional dozen or so extra- frared) imaging and stellar spectroscopy, and photometric tran- 
sit timing (for a review, see for example Charbonneau et al. 

, fl- ■ , ~ T e tt- 2007, and references therein). Milli-arcsecond (mas) astrome- 

Send Offprint requests to: A. Sozzetti, ' ' ^ ' 

e-mail- sozzetti@oato.inaf.it bright planet hosts within 200-300 pc provides precise 

' See, for example, Jean Schneider's Extrasolar Planet Encyclopedia distance estimates thanks to Hipparcos parallaxes (PeiTyman et 

at http://exoplanet.eu/ ^1- 1997). However, despite a few important successes (Benedict 

2 For a review, see Charbonneau et al. 2007, and ref- et al. 2002, 2006; McArthur et al. 2004; Bean et al. 2007), as- 

erences therein. For an updated list, see for example ob- trometric measurements with mas precision have so far proved 

swww.unige.ch/ pont/TRANSITS.htm, and references therein of limited utility when employed as either a follow-up tool or 
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to independently search for planetary mass companions orbiting 
nearby stars (for a review, see for example Sozzetti 2005, and 
references therein). 

In several past exploratory works (Casertano et al. 1996; 
Casertano & Sozzetti 1999; Lattanzi et al. 1997, 2000a, 2000b, 
2002; Sozzetti et al 2001, 2002, 2003a, 2003b), we have shown 
in some detail what space-borne astrometric observatories with 
micro-arcsecond (/ias)-level precision, such as Gaia (Ferryman 
et al. 2001) and SIM PlanetQuest (Unwin et al. 2007), can 
achieve in terms of search, detection and measurement of ex- 
trasolar planets of mass ranging from Jupiter-like to Earth-like. 
In those studies we adopted a qualitatively correct description of 
the measurements that each mission will carry out, and we es- 
timated detection probabilities and orbital parameters using re- 
alistic, non-linear least-square fits to those measurements. For 
Gaia, we used the then-current scanning law and error model; 
for SIM, we included reference stars, as well as the target, and 
adopted realistic observational overheads and signal-to-noise es- 
timates as provided by the SIM Project. Other, more recent stud- 
ies (Ford & Tremaine 2003; Ford 2004, 2006; Catanzarite et al. 
2006) have focused on evaluating the potential of astrometric 
planet searches with SIM, revisiting and essentially confirming 
the findings of our previous works. 

Although vahd and useful, the studies currently available 
need updating and improving. In the specific case of planet de- 
tection and measurement with Gaia, we have thus far largely ne- 
glected the difficult problem of selecting adequate starting val- 
ues for the non-linear fits, using perturbed starting values in- 
stead. The study of multiple-planet systems, and in particular 
the determination of whether the planets are coplanar — ^within 
suitable tolerances — is incomplete. The characteristics of Gaia 
have changed, in some ways substantially, since our last work 
on the subject (Sozzetti et al 2003a). Last but not least, in order 
to render the analysis truly independent from the simulations, 
these studies should be carried out in double-blind mode. 

We present here a substantial program of double-blind tests 
for planet detection with Gaia (preliminary findings were re- 
cently presented by Lattanzi et al. (2005)). The results expected 
from this study include: a) an improved, more realistic assess- 
ment of the detectability and measurabihty of single and mul- 
tiple planets under a variety of conditions, parametrized by the 
sensitivity of Gaia; b) an assessment of the impact of Gaia in 
critical areas of planet research, in dependence on its expected 
capabihties; and c) the estabhshment of several Centers with a 
high level of readiness for the analysis of Gaia observations rel- 
evant to the study of exoplanets. 

This paper is arranged as follows. In Section 2 we de- 
scribe our simulation setup and clearly state the working as- 
sumptions adopted (the relaxation of some of which might have 
a non-negligible impact on Gaia's planet-hunting capabilities). 
In Section 3 we present the bulk of the results obtained during 
our extensive campaign of double-blind tests. Section 4 attempts 
to put Gaia's potential for planet detection and measurement 
in context, by identifying several areas of planetary science in 
which Gaia can be expected, on the basis of our results, to have 
a dominant impact, and by delineating a small number of recom- 
mended research programs that can be conducted successfully 
by the mission as planned. In section 5 we summarize our find- 
ings and provide concluding remarks. 



2. Protocol definition and simulation setup 

2.1. Double-bUnd tests protocol 

For the purpose of this study, we have devised a specific proto- 
col for the double-blind tests campaign. Initially, three groups of 
participants are identified: 1) the Simulators define and generate 
the simulated observations, assuming specific characteristics of 
the Gaia satellite; simulators also define the type of results that 
are expected for each set of simulations; 2) the Solvers receive 
the simulated data and produce "solutions" — as defined by the 
simulators; solvers define the criteria they adopt in answering 
the questions posed by the simulators; 3) the Evaluators receive 
both the "truth" — i.e., the input parameters — from the simula- 
tors and the solutions from the solvers, compare the two, and 
draw a set of conclusions on the process. 

A sequence of tasks, each with well-defined goals and time 
scales, has been established. Each task requires a separate set of 
simulations, and is carried out in several steps: 

1 . Simulation: The Simulators make the required set of simula- 
tions available to the Solvers, together with a clear definition 
of the required solutions. 

2. Clarification: A short period after the simulations are made 
available in which the Solvers request any necessary clari- 
fication on the simulations themselves and on the required 
solutions; after the clarification period, there is no contact 
between Simulators and Solvers until the Discussion step. 

3. Delivery: On a specified deadline, the Simulators deliver the 
input parameters for the simulations to the Evaluator, and the 
Solvers deUver their solutions together with a clear explana- 
tion of the criteria they used — e.g., the statistical meaning 
of "detection", or how uncertainties on estimated parameters 
were defined. 

4. Evaluation: The Evaluators compare input parameters and 
solutions and carry out any statistical tests they find useful, 
both to establish the quality of the solutions and to interpret 
their results in terms of the capabilities of Gaia, if applicable. 

5. Discussion: The Evaluators pubhcize their initial results. All 
participants are given access to input parameters and all solu- 
tions, and the Evaluators' results are discussed and modified 
as needed. 

2.2. Observing scenario 

The simulations were provided by the group at the Torino 
Observatory. The simulations were made available via WWW as 
plain text files. A detailed description of the code for the simu- 
lation of Gaia astrometric observations can be found in our pre- 
vious exploratory works (Lattanzi et al. 2000a; Sozzetti et al. 
2001). We summarize here its main features. 

Each simulation consists of a time series of observations 
(with a nominal mission lifetime set to 5 years) of a sample 
of stars with given (catalog) low-accuracy astrometric param- 
eters (positions, proper motions, and parallax). Each observa- 
tion consists of a one-dimensional coordinate in the along-scan 
direction of the instantaneous great circle followed by Gaia at 
that instant. The initially unperturbed photocenter position of a 
star is computed on the basis of its five astrometric parameters, 
which are drawn from simple distributions, not resembling any 
specific galaxy model. The distribution of two-dimensional po- 
sitions is random, uniform. The distribution of proper motions is 
Gaussian, with dispersion equal to a value of transverse veloc- 
ity Vt - 15 km sec"', typical of the solar neighborhood. The 
photocenter position can then be corrected for the gravitational 
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perturbation of one or more planetary mass companions. The 
Keplerian motion of each orbiting planet is described via the full 
set of seven orbital elements. For simplicity, all experiments de- 
scribed here were produced assuming stellar mass - 1 Mq. 
The resulting astrometric signature (in arcsec) is then expressed 
as a = (Mp/Mi,) x (ap/d), where Mp is the planet mass, Op is the 
planet orbital semi-major axis and d is the distance to the system 
(in units of Mq, AU, and pc, respectively). Simulated observa- 
tions are generated by adding the appropriate astrometric noise, 
as described in the next section. 

Finally, the Gaia scanning law has been updated to the most 
recent expectations (precession angle around the Sun direction 

o 

^ = 50 , precession speed of the satellite's spin axis Vp = 5.22 
rev yr"', spin axis rotation speed Vr = 60 arcsec s"^), which 
result in fewer observations and possibly less ability to disen- 
tangle near-degenerate solutions than with the original scanning 
law (e.g., Lindegren & de Bruijne 2005). Each star is observed 
on average A^obs = 43 times 0; note that the simplest star-Hplanet 
solution has 12 parameters, and therefore the redundancy of the 
information is not very high. 

2.3. Assumptions and caveats 

The simulated data described above indicate that a number of 
working assumptions have been made. In particular, a variety of 
physical effects that can affect stellar positions have not been in- 
cluded, and a number of instrumental as well as astrophysical 
noise sources have not been considered (for a detailed review, 
see for example Sozzetti 2005). Our main simplifying assump- 
tions are summarized below. 

1) the position of a star at a given time is described in Euclidean 
space. A general relativistic formulation of Gaia-like global 
astrometric observations, which has been the subject of sev- 
eral studies in the recent past (Klioner & Kopeikin 1992; de 
Felice et al. 1998, 2001, 2004; Vecchiato et al. 2003; Klioner 
2003, 2004), has not been taken into account; 

2) we assume that the reconstruction and calibration of individ- 
ual great circles have been carried out, with nominal zero 
errors (i.e., knowledge of the spacecraft attitude is assumed 
perfect). We refer the reader to e.g. Sozzetti (2005), and 
references therein, for a summary of issues related to the 
complex problem of accurately calibrating out attitude errors 
(due to, e.g., particle radiation, thermal drifts, and spacecraft 
jitter) in space-borne astrometric measurements; 

3) the abscissa is only affected by random errors, and no sys- 
tematic effects are considered (e.g., zero-point errors, chro- 
maticity, radiation damage, etc.). A simple Gaussian mea- 
surement error model is implemented, with standard devia- 
tion cr^ - 2i yuas, which applies to bright targets (V < 13). In 
this context, the projected end-of-mission accuracy on astro- 
metric parameters is 4 juas. Recently, Gaia has successfully 
passed the Preliminary Design Review and entered phase 
C/D of the mission development. ESA has selected EADS- 
Astrium as Prime Contractor for the realization of the satel- 
lite. Scanning law and astrometric section of the selected 
payload, the only of relevance here, remain largely consistent 
with the assumptions adopted in our simulations. However, 
very recent estimates of the error budget indicate a possible 



^ We define as elementary observation the successive crossing of the 
two fields-of-view of the satellite, separated by the basic angle y = 
106.5°. 



degradation of 35% - 40% (i.e., ~ 11 //as) in the single- 
measurement precision, corresponding to a typical final ac- 
curacy of 5 - 5.5 fias for objects in the above magnitude 
range, with some dependence on spectral type (red objects 
performing closer to specifications). We will address in the 
discussion section the possible impact of such performance 
degradation on Gala's planet-hunting capabilities; 

4) light aberration, light deflection, and other apparent eff'ects 
(e.g., perspective acceleration) are considered as perfectly 
removed from the observed along-scan measurements; 

5) when multi-component systems are generated around a tar- 
get, the resulting astrometric signal is the superposition of 
two strictly non-interacting Keplerian orbits. It is recognized 
that gravitational interactions among planets can result in 
significant deviations from purely Keplerian motion (such as 
the case of the GJ 876 planetary system, e.g. Laughlin et al. 
2005). However, most of the multiple-planet systems discov- 
ered to-date by radial-velocity techniques can be well mod- 
eled by planets on independent Keplerian orbits, at least for 
time-scales comparable to Gala's expected mission duration; 

6) a number of potentially important sources of 'astrophysi- 
cal' noise, due to the environment or intrinsic to the target, 
have not been included in the simulations. In particular, we 
have not considered a) the dynamical effect induced by long- 
period stellar companions to the targets, b) the possible shifts 
in the stellar photocenter due to the presence of circumstel- 
lar disks with embedded planets (Rice et al. 2003a; Takeuchi 
et al. 2005), and c) variations in the apparent stellar position 
produced by surface temperature inhomogeneities, such as 
spots and flares (e.g, Sozzetti 2005, and references therein; 
Eriksson & Lindegren 2007). 

The geometric model of the measurement process is de- 
scribed in detail in the Appendix. 

3. Results 

The double-blind test campaign encompassed a set of experi- 
ments that were necessary to establish a reliable estimate of the 
planet search and measurement capabilities of Gaia under real- 
istic analysis procedures, albeit in the presence of an idealized 
measurement process. In particular, a number of different tasks 
were designed, such that the participating groups would be able 
a) to analyze data produced by a nominal satellite, without taking 
into account the imperfections due to measurement biases, non- 
Gaussian error distributions, imperfections in the sphere solu- 
tion, and so on; b) to convert any Gaussian error model for Gaia 
measurements into expected detection probability — including 
completeness and false positives — and accuracy in orbital pa- 
rameters that can be achieved within the mission; c) to assess 
to what extent, and with what reliability, coplanarity of multiple 
planets can be determined, and how the presence of a planet can 
degrade the orbital solution for another 

We broke down the work plan into three tests: Tl, T2, and 
T3, whose main results are presented below. To facilitate read- 
ing, we have chosen to provide the summaries of the results con- 
cerning each of the three tests at the beginning of the correspond- 
ing sub-sections. 

3. 1 . Test T1 : Planet detection 

Test Tl is designed primarily to establish the reliability and com- 
pleteness of planet detections for single-planet systems based on 
simulated data, with full a priori understanding of their noise 
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Fig. 1. Left: Distribution of period and signature for the planets missed by Solver A's broad criterion (Al). If more than one planet 
is present, the one with the largest signature is plotted. Right: distribution of period and signature for the planets missed by criterion 
Bl. 
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Fig. 2. Inclination and eccentricity of the planets simulated for 
the Tl experiment. Black dots are planets with a < 15 //as and 
P < 6 yr not detected by the A I criterion. 



characteristics. Simulated data were prepared for 100,000 stars. 
Of these, 45,202 have no planets, 49,870 one, 3878 two, and 
1050 have three planets. The astrometric signature of each planet 
ranges from 0.8 to 80 //as (0.1 < aja^ < 10), the period P 
from 0.2 to 12 years, while eccentricities are drawn from a ran- 
dom distribution within the range 0.0 < e < 0.9. All other or- 
bital elements (inchnation i, longitude of pericenter uj, pericenter 
epoch T, and position angle of the nodes Q) were distributed ran- 
domly as follows: 1° < / < 90°, 0°w < 360°, < t < P, and 

o 

0° < Q < 180 . For systems with multiple planets, there was no 
specific relationship between periods, phases, or ampUtudes of 
the planetary signatures. The distribution of planetary signatures 
was unknown to the solvers. 



On this dataset, solvers were asked to carry out two tests. 
Test Tl consisted of identifying the likely planet detections, 
based on a single-star analysis and criteria of the Solvers' own 
choosing. Test Tib gave the opportunity to the solvers to im- 
prove on their planet detection on the basis of an orbital fit, i.e., 
using the knowledge that the deviations from a single-star model 
were in fact expected to have the signature of a star-planet sys- 
tem. Two Solvers participated in this step and provided com- 
pletely independent solutions. Solver A attempted to improve 
the quality of planet detection using orbital fits, in the spirit of 
the Tib test; Solver B was satisfied with the quality of the de- 
tection achieved from the statistical properties of the residuals to 
the single-star fit. Although the solvers used different detection 
criteria and post-processing analysis, both ultimately achieved 
good (and comparable) detection quality, indicating that the pro- 
cedures they used are robust and consistent. In particular, the Tl 
experiment has shown that, at least for the cases under consid- 
eration, detection tests (e.g.,x^ or F2) based on deviations from 
the single-star astrometric solution perform as well as can be ex- 
pected. Planets down to astrometric signature a ^ 2cr^ can be 
detected reliably and consistently, with a very small number of 
false positives. Even better, the choice of the detection threshold 
is an eff'ective way to distinguish between highly reliable and 
marginal candidates. Under the assumptions of this test, which 
is based on an idealized, perfectly known noise model, poten- 
tial planet-bearing stars can be identified and screened reliably. 
Refinements of the detection criteria based on additional consid- 
erations, e.g., the quaUty of the orbital fit, can potentially make 
an improvement in the fitting procedure. However, the perfor- 
mance of a straight or F2 test is already extremely good; such 
tests, if properly applied, can yield candidates with the expected 
range of sensitivity and with a powerful discrimination against 
false positives. 



S. Casertano et al.: DBT Campaign for Planet Detection with Gaia 



5 




Fig. 3. Left: same as Figure |2] but here the results are expressed in terms of incHnation angle and number of observations. Right: 
same as Figure |2] in the e-A^obs plane. 



3.1.1. First-pass detection 

Both solvers approach test Tl on the basis of the quality of the 
single-star, five-parameter solution for the astrometric measure- 
ments. 

Solver A adopts two criteria to identify candidate planets, 
one broad, aimed at detecting as many candidates as reasonable, 
and one strict, designed to reduce the number of false positives. 
Specifically, Solver A uses P(x^), the probability that the ob- 
served of the single-star solution is as bad or worse than the 
value observed in the presence of pure measurement errors, and 
P{F), the F-test probability on the same fit. A large value of 
or of the F statistic can readily arise if the deviations due to the 
presence of a planet are much larger than the expected measure- 
ment errors, and thus a low value of P(x^) and P(F) signifies 
likely planet (and unlikely false positive). 

The broad criterion, Al, requires only that P(x^) < 0.05, 
and favors completeness over reliability: many more marginal 
candidates are included, but false positives will be more numer- 
ous. The strict criterion, A2, requires both P(x^) < 0.0001 and 
P(F) < 0.0001, and favors reliability over completeness: candi- 
dates satisfying this criterion have a small probability of being 
false positives, but many marginal cases will be missed. 

Criterion Al identifies 44,914 candidates, of which 42,810 
are indeed planets and 2,104 are false positives, close to the 5% 
expected from the criterion. On the other hand, 11,988 plan- 
ets are missed by this criterion. Typically, the planets missed 
have signature smaller than 15 yuas or period longer than 5 years 
(Figure [T] left panel), high eccentricity and/or close to edge-on 
orbits (Figure |2]l, and relatively small numbers of observations 
(Nobs < 40, Figure[3]l. The performance of this and other criteria 
discussed here is summarized in Table [T] 

Criterion A2 yields only 28,655 detections, with no false 
positives, but misses 26,143 planets — only half of the true plan- 
ets are found. Because of the more demanding criteria, planets 
with signature up to 30 /^as can be missed by this criterion, re- 
gardless of period. Nonetheless, the dramatic drop in false pos- 
itives is very important, and would probably favor the stricter 
criterion. 



A further refinement of Solver A's search criterion is dis- 
cussed below. However, it is worth noting that a criterion based 
purely on Pix^) < 0.0001, without the PiF) requirement, would 
detect 34,918 planets, only 4 of which are false positives, and 
miss 19,880 — a substantially better performance at the cost of a 
modest number of false positives. 

Solver B adopts a similar method, using specifically the F2 
indicator (see the Hipparcos Catalogue, vol. 1, p. 112), which 
is expected to follow a normal distribution with mean and 
dispersion 1. His criterion, Bl, requires \F2\ > 3, which in 
essence is a 3-sigma criterion. With this criterion. Solver B iden- 
tifies 37,643 correctly as having a planet (or more), while 17, 155 
are missed and 106 (0.2% of the no-star sample) are false pos- 
itives. Similarly to Al, the missed planets mostly have signa- 
ture smaller than 20 yuas or period longer than 5 years (Figure [1] 
right panel). The overall distribution is similar to that of planets 
missed by Al, although more marginal cases are excluded — and 
fewer false positives are included. 

Criterion Bl appears to be preferable to Al, which finds 
5,000 more planets at the cost of nearly 2,000 false positives. 
If a 0.2% incidence of false positives is considered acceptable, 
the performance is also better than that of A2, with nearly 9,000 
more planets found at a modest cost in false positives. However, 
the simple P(x^) < 0.0001 criterion finds nearly as many planets, 
with a much smaller fraction of false positives. In practice, the 
choice between these criteria would depend on the specific ap- 
plication and sample properties. For example, for the simulated 
data studied here, a fine-tuned P(x^) test, e.g., with threshold 
set at 0.001 (CI), would find 37,714 valid candidates (about as 
many as B 1 and 2,000 fewer than A4, discussed below) and only 
68 false positives. 

3.1.2. Refining tlie detection criteria 

Realizing that his strict criterion (which requires both P(x^) < 
0.0001 and P{F) < 0.0001) may be too stringent, while the sim- 
ple P{x^) < 0.05 criterion is expected to allow too many false 
positives. Solver A attempts a detection refinement based on the 
quality of the orbital fit, in the spirit of the Tib test. 
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Table 1. Summary of detection probability 



Name Criterion Detections Missed 

Total True False 

~Al P(x^) < 0.05 449T4 428T0 2T04 11988 

A2 POt^) < 0.0001, PCF) < 0.0001 28 655 28 655 26 143 

A3 Z'Cir^) < 0.0001 or 40196 39 630 566 15 168 

POr') < 0.05, /'Cr-)o,bi,ai > 0.2 

A4 PCr^) < 0.0001 or 39957 39768 189 15030 

P(X^) < 0.05, P(Ax^) < 0.001 

Bl |F2|> 3 37 749 37 643 106 17 155 

CI P(y-) < 0.001 37 782 37 714 68 17 084 



For the purpose of this test. Solver A considers the 
"marginal" candidates with 0.0001 < P(x^) < 0.05; of these, 
2,100 are in fact false positives, while 7,892 have a real planet. In 
this case, there are 34,918 non-marginal detections — those with 
P(X^) < 0.0001— of which only 4 are false detections. 

The first refinement (A3) is based on the quality of the orbital 
fit: a marginal candidate passes if the;i'^ statistics of the residuals 
after the orbital fit improves to P(x^) > 0.2 (a minimum factor 
4 improvement). A total of 5,274 marginal candidates pass this 
test; of these, 11% are false detections. Of the marginal candi- 
dates that do not pass the refinement, 33% are false positives. 
Thus, this orbital refinement does improve the probability that 
the candidate is real, and can in fact increase the sample of pos- 
sible candidates (see Table[l]i. 

The second refinement for the marginal candidates (A4) is 
based on the likelihood ratio test applied to the two fits, with or 
without the planet. For a candidate to pass, the fit with the planet 
is required to improve the with a probability better than 0.00 1 , 
i.e., P(Ax^) < 0.001. Of the 5,035 stars that pass, 96% do in 
fact have a planet; only 185 are false positives. The likelihood 
ratio improvement appears to perform significantly better than 
the simpler test based on the new x^ probability (see Table[T]l. 

The refined criteria, especially A4, do improve substantially 
on Al, bringing its performance in line with that of Bl. A4 finds 
about 2,000 more candidates than Bl, but 83 more false posi- 
tives. Bl is simpler to apply, and the expected distribution of the 
F2 statistic is well-defined in the case of stars without planets; 
this makes it possible to clearly label those candidates that are 
most likely to be false positives, and therefore to derive samples 
with different levels of confidence for different purposes. On the 
other hand, A4 offers the potential to detect more stars, includ- 
ing potentially some stars with relatively small signatures but a 
good orbital fit, without an excessive increase in the number of 
false positives. Neither approach offers the freedom from false 
detections of A2, which however comes at the cost of fewer can- 
didates. 

It may be worthwhile considering orbital fit criteria as a 
means to improve the detection statistics for a more tightly se- 
lected initial sample. For example, one could consider a likeli- 
hood ratio threshold that depends on the original P(x^), so that 
more marginal candidates (with a greater probability of being 
false positives) are held to a stricter likelihood ratio requirement. 
Conceivably, such requirements could achieve a better combi- 
nation of sensitivity and reliability than straight or F2 tests. 
However, their investigation is beyond the scope of this analysis; 
a new set of tests would be needed to assess such techniques in 
true double-blind fashion. 



3.2. Test T2: Single-planet orbit determination 

The T2 experiment is designed primarily to establish the accu- 
racy of the orbital determination for single planets with solidly 
detected signatures, under the assumption that the noise charac- 
teristics of the data are fully understood. Solvers knew that each 
star had one planet, but did not know the distribution of signa- 
tures and periods. The T2 test determines how well the orbital 
parameters of a single planet can be measured for a variety of 
signature significance, period, inclination, and other parameters. 
Simulated data were prepared for 50,000 stars, each with exactly 
one planet with signatures ranging between 16 //as (astrometric 
signal-to-noise a/cr^ = 2) and 1.6 mas (a/cr^ = 200) and peri- 
ods between 0.2 and 12 years; all other orbital parameters were 
randomly distributed with the same prescriptions of Test Tl. 

Each solver was asked to carry out a full orbital reconstruc- 
tion analysis for each star, beginning from the period search and 
including error estimates for each of the orbital parameters. As 
for the Tl test, two solvers, A and B, participated in this test, 
each with their independently developed numerical code. The 
first, obvious conclusion is that both solvers achieve very good 
results, recovering very solidly the orbital parameters of the vast 
majority of 'good' cases - those with high astrometric signature 
and period shorter than the mission duration. In addition, their 
results are extremely consistent, indicating the robustness of the 
procedures they developed and of the overall approach. 

Both Solvers run their respective pipelines, consisting of de- 
tection, initial parameter determination, and orbital reconstruc- 
tion, on each of the 50,000 simulated time series provided by 
the Simulators. They have no a priori knowledge of the orbital 
properties of each planet, although they do know that each star 
is expected to have one and only one planet. 

In both cases, solvers use the equivalent of a least-squares 
algorithm to fit the astrometric data for each planet; they need to 
solve for the star's basic astrometric information (position, par- 
allax, proper motion), for which only low-accuracy catalog pa- 
rameters are provided, as well as for the parameters of the reflex 
motion. Solver B provides orbital solutions expressed in terms 
of P, e, T, and the four Thiele-Innes parameters A, B, F, G (e.g.. 
Green 1985). He provides also estimated uncertainties for each 
parameter and the full covariance matrix. Solver A also provides 
P, e, and t, but instead of the Thiele-Innes parameters, he returns 
a, i, Q., and w. He computes formal errors for each parameter, but 
not the covariance matrix. 

Solver B reports no solution for 521 stars, about 1% of the 
total. Solver A reports a solution for all stars, but 69 are invalid 
as the estimated error in the orbital parameters is undefined; we 
exclude these objects from further consideration. In addition, a 
few tens of objects have very large errors, and may not be mean- 
ingful. It is important to note that for both solvers the number 
of such cases is very small, and — as they are identified during 
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Fig. 6. Distribution of estimated periods and their errors for or- 
bits with signature larger than 0.4 mas as a function of true pe- 
riod. The hnes with error bars show the median and interquar- 
tile range for the period estimated by Solver A (solid) and B 
(dashed). The lines without error bars represent the median esti- 
mated errors from the fitting procedure for Solver A (solid) and 
B (dashed). 

the solution process — they present no risk of contaminating the 
search for planets; they simply reflect the hmitations of the ob- 
servations. 

3.2.1. Retrieving orbital parameters 

The orbital period is perhaps the most important of the orbital 
parameters, and generally the most critical in terms of obtain- 
ing an orbital solution that is close to the truth. Period search is 
usually a delicate process, and aliasing, especially for relatively 
sparsely sampled orbits, can be a serious concern. Therefore the 
evaluation of the solutions starts with the orbital period. In sum- 
mary, the period is retrieved with very good accuracy and small 
bias for true periods ranging from 0.3 to 6 years. A small fraction 
of very short and very long periods are aliased to very different 
periods; these cannot be readily identified by simply inspecting 
the estimated errors. Long periods are systematically underesti- 
mated; this trend is predictable on the basis of simulations, and 
the amount of bias is comparable to the estimated period error. 

Figure|4]shows the quality of the match between the true pe- 
riod and the solution by Solver B (Solution 1) and by Solver A 
(Solution 2). For the 20,411 stars with true period shorter than 
5 years, both solvers recover over 98% with a fractional error in 
the period of 10% or smaller (20,054 for Solver A, 20,158 for 
Solver A). This includes a few cases (45 for Solver A, 27 for 
Solver B) for which no valid solution is returned. Almost all the 
cases with poor period determination have either very small sig- 
natures or periods shorter than 3 months, for which aliasing can 
occur with the relatively sparse sampling of the Gaia scanning 
law. Such cases are rare, no more than 2% of all short-period 
planets, but are not readily identified by the nominal error in the 
period. Short-period solutions will probably need to be looked 
at more carefully to eliminate the possibility of aliasing in the 
solution. 

While fidelity is extremely good for planets with true period 
ranging from a few months to the mission lifetime, the quality 



of the solution degrades quickly for periods longer than the mis- 
sion duration. Visually, it is clear that - for given amplitude of 
the perturbation - the ability to recover the planet's period with 
modest errors starts degrading at periods of about 6 years. Note 
also that for very long true periods, the fitted period is system- 
atically shorter than the truth; at 10 years, the typical recovered 
period is substantially shorter, about 7 years, with a very large 
dispersion. In a small number of cases (418 for Solver A, 150 
for Solver B), a very small period is fitted to a long period ob- 
ject (resulting in the small cloud of points near the P = axis in 
both panels of Figure |4]i, indicating that the fit has aliased into a 
completely diff'erent range. 

Figure |5] shows the error in the period, as estimated by each 
Solver, as a function of true period. As in the period difference, 
the estimated error also increases greatly with increasing period, 
and in fact the estimated uncertainties are comparable with the 
error in the fitted period shown in Figure|4] 

The comparison between error in fitted period and estimated 
error is shown in a more quantitative way in Figure|6] The curves 
and error bars illustrate the median and quartiles of the fitted 
period distribution in bins of true period, solid for Solver A and 
dotted for Solver B; the thin diagonal dashed line corresponds to 
exact solutions. As it can be clearly seen, the period solution is 
very good, without indication of significant bias, up to about 6 
years, beyond which the solution underestimates the period. The 
median estimated errors (lower curves) match the interquartile 
range reasonably well. 

Figure Q shows how the period accuracy varies with signa- 
ture for periods around 1, 3, 5, and 6 years. In each case, larger 
signatures mean a stronger astrometric signal, and thus better ac- 
curacy; the distribution of errors matches well the estimate from 
the solution itself. In each panel, the blue dots (scale to the left) 
represent the difference between fitted and true period as func- 
tion of true signature in the stated period range, and the red dots 
(scale to the right) show the error as estimated by the solver for 
that particular orbit. The solid lines and points represent the me- 
dian values for a 0.2 mas bin in signature; the error bars for the 
period error show the range between the first and third quartile 
in each bin. Panels to the left refer to solutions by Solver A, to 
the right by Solver B. In each panel and for each signature bin, 
the median estimated error (red triangles) is very close to the 
difference between median and quartile error for the same set of 
solutions, indicating that the estimated errors are a good guide to 
the true errors. The median of the difference between fitted and 
true period (blue squares) is generally small, showing that there 
is very little bias in the period estimate. 

The other orbital parameters are similarly well estimated for 
the vast majority of "good" orbital solutions, excluding those 
with low signature and long period. For example, Figure[8]com- 
pares the eccentricity fitted by the two Solvers with the true value 
for all stars with period shorter than 5 years and signature larger 
than 0.4 mas, which corresponds to the top 75% in signature. 
Similarly, Figure |9] shows the true and fitted (by Solver B) val- 
ues of the Thiele-Innes parameters A and B for the same cases. 
Clearly both sets of parameters represent high quality orbital fits. 
Other orbital parameters follow similar patterns. 

Finally, it is worth mentioning how subtle differences in the 
orbital solutions carried out by the two solvers can be seen if 
one focuses on regimes of relatively low astrometric signal. We 
show for example in Figure[TO]the comparison between the dis- 
tributions of fitted P and e for Solver A and Solver B in cases of 
5 < a/cr^ < 10 and 3 < a/cr^ < 5, and restricting ourselves to 
good solutions for which P is within 10% of the true value and 
e differs from the true value by no more than 0.1. On the one 
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Fig. 4. Distribution of fitted period as a function of true period for Solver A (left) and B (right). 



'3- 

■-1 

1 



Fig. 5. Distribution of estimated error in the period as a function of true period for Solver A (left) and B (right). 



hand, from the Figure it is clear how both solvers identify and 
measure precisely orbital periods for virtually the same stars; a 
Kolmogorov-Smirnov (K-S) test gives a probability that the two 
disttibutions are the same of 0.15 and 0.98 for the two regimes 
of signal strength investigated. On the other hand, the disttibu- 
tions of well-measured eccentricities are significantly difl'erent, 
with a the K-S test giving a probability of the null hypothesis of 
0.04 and 0.005, respectively. The most obvious feature is the in- 
crease in the number of very large eccentticity values (e > 0.6) 
correctly identified by Solver A with respect to Solver B. In 
particular, in the range 3 < a/cr^ < 5 Solver A measures ac- 
curately the eccentricity for some 23% more stars than Solver 
B. A possible explanation for this discrepancy maybe found in 
the different approaches the two solvers adopt to reach the con- 
figuration of initial starting guesses for the parameters in the 
orbital fits. Both solvers tackle this issue implementing a two- 
tiered strategy consisting of a combined global -i-local minimiza- 



tion procedure. Solver A uses a methodology similar to that de- 
scribed in Konacki et al. (2002), in which a Fourier expansion 
of the Keplerian motion is used to derive initial guesses of the 
full set of orbital elements, subsequently utilized in a local non- 
hnear least-squares analysis. Instead, Solver B adopts a scheme 
in which a guess to P is obtained using a period-search tech- 
nique (e.g. Horne & Baliunas 1986), and then an exploration of 
the (e, T)-space is carried out to derive the linear parameters A, 
B, F, and G as the unique minimizer of when e, P, and r 
are fixed (e.g., Pourbaix 2002). However, for highly non-hnear 
fitting procedures with a large number of model parameters the 
statistical properties of the solutions are not at all trivial (and sig- 
nificantly differ from those of linear models). A serious study of 
differences in the fitting procedures adopted by the two Solvers 
would require, for example, an in-depth analysis of the relative 
agreement between a variety of statistical indicators of the qual- 
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Fig. 7. Error in period as a function of astrometric signature for different period ranges and for both solvers. The dots show the 
difference between fitted and true period (blue, left axis) and the estimated uncertainty from the solution (red, right axis). Shown on 
the left panels are the solutions by Solver A, and on the right those by Solver B. Heavy dots represent the median values, binned in 
astrometric signature; error bars represent the interquartile range. 



ity and robustness of the fits. Such a study lies beyond the scope 
of this work, and we leave it for future investigations. 

3.2.2. Estimated and actual errors 

A more quantitative analysis of the fitted parameters can be car- 
ried out by comparing the distribution of differences between 
true and fitted parameters with the errors estimated as part of the 
solution process. The distribution of differences can be used to 
determine the actual uncertainties in the fit, which in the ideal 
case would match the uncertainties estimated by the fit. In re- 
ality, this is not a perfect process; the estimated uncertainties 



are based on noisy data, and therefore tend to be biased towards 
smaller values when the noise produces an apparently larger sig- 
nal. Nonetheless, a general agreement between estimated and ac- 
tual errors is to be expected for a good fitting process. 

The results presented in this Section demonstrate that both 
Solvers are not only capable of recovering the expected signal 
for the overwhelming majority of the simulated orbits under the 
conditions of the T2 test (as shown in the previous Section), but 
also that error estimates are generally accurate, with the overall 
distribution of the difference between fitted and true parameters 
very close to the solution results. Some discrepancies — a bias of 
up to 2 sigma in estimated period and a mismatch of up to a fac- 
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Fig. 8. Fitted vs. true orbital eccentricity for Solver A (left) and Solver B (right). Included are the orbits with signature larger than 
0.4 mas — approximately 75% of the cases studied — and period shorter than 5 years. 




True value of Thiele-Innes A (mas) True value of Thiele-Innes B (mas) 

Fig. 9. Fitted vs. true values of the Thiele-Innes parameters A and B, according to the solution by Solver B. As in Figure[8] included 
are orbits with a > 0.4 mas and f > 5 yr. 



tor 2 in estimated errors — do occur under special circumstances, 
such as very short and very long periods. These discrepancies, 
small in the economy of this test, can be evaluated and cor- 
rected for by a more thorough understanding of the estimation 
process and its error estimates. An incorrect solution is returned 
for about 2% of the planets. Such cases are not identified from 
their formal error estimates, and will need to be addressed by a 
more aggressive understanding of possible aliasing in orbital pa- 
rameter space. Simulations and solutions show conclusively that 
correct solutions with accurate error estimates can be obtained 
for about 98% of the simulated planets. 

Indeed, the estimated and actual errors do match with good 
accuracy under most conditions. An indication can be seen in 
Figure |6] where we show that the typical difference between true 
and fitted period, as estimated from the interquartile range, is 
very close to the median estimated uncertainty for diverse values 
of orbital period and amplitude. 



A more quantitative — and challenging — test can be carried 
out by studying the distribution of differences in the parameters 
compared with their predicted errors. Since predicted errors can 
in principle depend on the amplitude of the signature, period, 
times of observation, and other orbit details, we define the scaled 
difference as the difference between the fitted and the true value 
of an orbital parameter, divided by its estimated uncertainty for 
that same case. If the errors are predicted correctly and follow 
a Gaussian distribution, this quantity will also be distributed 
normally with zero mean and unit dispersion. Discrepancies be- 
tween predicted and actual errors will show up as distortions in 
this distribution. 

The expectation of a good error distribution should hold pri- 
marily for the cases with good signal and solid orbit reconstruc- 
tion, for which the true and the reconstructed orbits are close. 
We therefore focus on planets with P < 5 years and o- > 0.4 
mas, about 20,000 cases. 
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Fig. 10. Top left and right: distributions of well-measured values of P and e for the two Solvers in the case of 5 < a/cr^ < 10. 
Bottom left and right: the same, but for the case of 3 < a/cr^ < 5. 



Figure [TT] shows a definite distortion of the overall scaled 
difference in period for both Solver B (blue) and Solver A (red); 
the width of the distribution is similar to the predicted value 
(dashed), but the peak is shifted towards positive values (i.e., 
the fitted value of the period is statistically biased towards pos- 
itive errors, or longer periods). The difference is small, about 
0.5-sigma, but it is nonetheless statistically significant because 
of the large number of simulations used. 

The difference in period appears to be a function of the pe- 
riod itself. When considering planets with different periods, it 
appears that the period difference decreases for longer periods, 
and vanishes at ~ 5 years. This appears clearly in Figure [121 
where the median and interquartile scaled period difference is 



binned as a function of period for both solutions. Periods shorter 
than 5 years are overestimated, while longer periods are under- 
estimated. The difference remains comparable to the estimated 
error (one sigma), except for periods around 1 year and shorter 
which are overestimated by up to 2 sigma. We remind the reader 
that this is in part a result of the eiTors being very small; the typ- 
ical period error at 1 year is 0.005 years (see Figure |7j, so as a 
fraction of the period itself, this bias is typically less than a per- 
cent. Nonetheless, the fact that the difference is systematic and 
present in both solutions suggests that there is a conceptual issue 
worth of further analysis. 

We next consider the distribution of linear parameters, us- 
ing the Thiele-Innes B parameter in the Solver B solution as an 
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Fig. 12. Scaled period differences for Solver A (left) and Solver B (right), for all orbits with signature larger than 0.4 mas. The curve 
and error bar represent the median and quartiles in 1-year bins in true period. 
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Fig. 13. Distribution of scaled difference in the Thiele-Innes parameter B for the Solver B solution. The left panel shows all data 
points; the right panel only the planets with signature larger than 0.4 mas and period shorter than 5 years. The dashed curve in each 
plot is a reference Gaussian with zero mean and unit dispersion. 



example. The overall distribution of scaled errors is, not surpris- 
ingly, unbiased in the mean, and is comparable in width to the 
expected distribution (Figure [13] left panel). However, the ob- 
served distribution does differ from the nominal Gaussian, both 
for small and for large errors. The core of the distribution ap- 
pears narrower than the Gaussian, indicating that errors may be 
overestimated for part of the distribution; on the other hand, the 
elevated tails — and the 2% of solutions that fall outside the 5- 
sigma range of the histogram — indicate that errors are underes- 
timated for some objects. 

A closer analysis shows that the narrow peak is due primarily 
to planets with small signatures (< 0.4 mas) and periods shorter 
than 5 years, while the tails are largely due to long-period plan- 
ets. Figure [13] right panel, shows that the distribution of scaled 
differences for B for all planets with signature larger than 0.4 



mas and period shorter than 5 years is very close to Gaussian, 
although about 2% of outliers remain. 

3.3. Test T3: Multiple-planet solutions and coplanarity 

The T3 experiment is designed primarily to determine how well 
multiple-planet systems can be identified and solved for, as well 
as how well the mutual inclination angle of pairs of planetary 
orbits can be measured. In addition, the accuracy of multiple- 
planet solutions will be compared with that of single-planet so- 
lutions for systems with similar properties. The noise character- 
istics of the data are assumed to be fully understood. 

Each solver was asked to carry out a full orbital reconstruc- 
tion analysis for each star, beginning from the period search and 
including error estimates for each of the orbital parameters. As 
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Fig. 11. Histogram of scaled period differences for planets with 
period between 1 and 5 years and signature larger than 0.4 mas. 
The red histogram is for Solver A, blue for Solver B. The dashed 
lines represent a Gaussian distribution with zero mean and unit 
dispersion. 



for the Tl and T2 tests, two solvers. Solver A and Solver B, 
participated in this test, each with their independently developed 
numerical code. 

Simulated data were prepared for 3,000 stars, in two sepa- 
rate experiments (T3a and T3b). In the two cases, respectively 
310 and 307 objects had one planet, while the remaining 2690 
and 2693 had two planets. In both experiments, astrometric sig- 
natures ranged between o- = 16 fias (astrometric signal-to-noise 
a/cr^ - 2) and a = 400 //as {a/cr^ ^ 50). The first planet was 
always generated with a mass = 1 My, and with P uniformly, 
randomly distributed between 0.2 and 9 years. The second planet 
was constrained to have P at least a factor 2 shorter or longer 
than the first planet, and its corresponding mass was assigned 
as to produce an astrometric signal falling in the above men- 
tioned range. The orbital eccentricity was randomly distributed, 
but limited to the ranges 0.1 < e < 0.6 and 0.0 < e < 0.6 in the 
T3a and T3b experiments, respectively. In the first experiment, 
no constraints were placed on the value of the mutual inclina- 
tion angle /,ei between pairs of planetary orbits. In the second 

o 

experiment, it was constrained to be /rei < 10 . 

Both Solvers run their respective pipelines, consisting of de- 
tection, initial parameter determination, and orbital reconstruc- 
tion, on each of the 3,000 simulated time series provided by the 
Simulators. They have no a priori knowledge of the orbital prop- 
erties of each planet, nor they know whether a star has none, one, 
or more planets. 

In both cases, solvers use the equivalent of a least-squares 
algorithm to fit the astrometric data for each planet; they need 
to solve for the star's basic astrometric information (position, 
parallax, proper motion), for which only low-accuracy catalog 
parameters are provided, as well as for the parameters of the re- 



flex motion, for each detected companion. For all planets fitted 
for. Solver B provides the results in the form of period P, ec- 
centricity e, epoch of pericenter passage T, and the Thiele-Innes 
parameters A, B, F, G. He provides also estimated uncertainties 
for each parameter. Solver A also provides period, eccentricity, 
and pericenter passage, but instead of the Thiele-Innes parame- 
ters, he returns semi-major axis a, inclination /, position angle of 
the ascending node Q, and longitude of pericenter oj. Like Solver 
B, he computes formal errors for each parameter. 

In summary, the results presented in Sec. 3.3 demonstrate 
that the expected signal can be recovered for over 70% of the 
simulated orbits under the conditions of the T3 test (for every 
two-planet system, periods shorter than 9 years and differing by 
at least a factor of two, 2 < a/cr^ < 50, moderate eccentric- 
ities). Favorable orbital configurations (both planets with peri- 
ods < 4 years, both astrometric signals at least ten times larger 
than the nominal single-measurement error, and redundancy of 
over a factor two in the number of data points with respect to 
the number of fitted parameters) have periods measured to better 
than 10% accuracy > 90% of the time, and comparable results 
hold for other orbital elements. A modest degradation of up to 
10% (slightly different for different parameters) is observed in 
the fraction of correct solutions with respect to the single-planet 
solutions of the T2 test. The useful range of periods for accu- 
rate orbit reconstruction is reduced by about 30% with respect 
to the single-planet case. The overall results are mostly insensi- 
tive to the mutual inclination of pairs of planetary orbits. Over 
80% of favorable configurations have i^ei measured to better than 

o 

10 , with only mild dependencies on its actual value, or on the 
inclination angle with respect to the line of sight of the planets. 
Error estimates are generally accurate, particularly for fitted pa- 
rameters such as the orbital period, while (propagated) formal 
uncertainties on the mutual inclination angle seem to often un- 
derestimate the true errors. Finally, it is worth mentioning how, 
as already shown by radial-velocity surveys, long-term astro- 
metric monitoring, even with lower per-measurement precision, 
would be very beneficial for improving on the determination of 
multiple-planet system orbits and mutual alignment, thanks to 
the increasingly higher redundancy in the number of observa- 
tions with respect to the number of estimated model parameters 
in the solutions. 

3.3.1 . Overall quality of the solutions 

For both experiments. Solver A reports solutions for all stars. 
Solver A initially carries out an orbital solution for a single 
planet orbiting each star. He then performs a;t'^-test on the post- 
fit observation residuals, at the 99.73% confidence level. This 
allows one to provide an initial assessment of the detectability 
of the signal of a second planet in the system, as a function of its 
properties. 

For the first experiment, a total of 509 objects have P(x^) > 
0.0027, thus are classified as systems with only one planet. Of 
these, 289 out of 310 simulated ones are truly star-i-planet sys- 
tems. Of the remaining 220 objects orbited by 2 planets but 
for which a single planet solution appears satisfactory, the over- 
whelming majority of the cases (93%) are constituted by systems 
in which at least one planet has P exceeding the time-span of 
the observations (T = 5 yr), and often times the inner planet has 
P ^ T.ln virtually all cases, the fitted value of the period is close 
to that of the inner planet, or it's intermediate between that of the 
inner and that of the outer planet. In the 7% of cases in which 
both planets have P < 5 yr, one of the astrometric signatures is 
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always close to the detection Umit (a/cr^ < 3). Essentially iden- 
tical results hold for the second experiment. A more thorough in- 
vestigation of the behavior of false detections and of the realm of 
degradation of detection efficiency in presence of a second planet 
is beyond the scope of this report, and it will require much larger 
sample sizes. Finally, Solver A performs a two-planet solution 
on all stars. In both experiments, essentially the same fraction 
of systems with two planets 73%) passes the;t'^-test on the 
post-fit residuals, at the 99.73% level. For the remaining 27% of 
cases in which a two-planet solution is not satisfactory within 
the predefined statistical tolerances. Solver A does not attempt 
to fit for a three-planet configuration. 

From the results reported by Solver B for the T3b exper- 
iment, 24 stars have no solution (in 85% of the cases objects 
with less than 25 observations). For the remaining 2976 objects. 
Solver B fits at least two planets, and a 3-planet orbital solu- 
tion is reported for 43% of the sample. Overall, ~ 56% of the 
systems are coiTectly identified by Solver B as having only two 
planets, with post-fit P(x^) > 0.05. The T3a experiment yielded 
very similar results. 

Overall, only ~ 40% of the two-planet systems simulated 
have a good solution according to both Solvers. Simply based 
on the post-fit test, the two fitting algorithms thus perform 
differently in a measurable fashion, unlike the T2 test case, in 
which the performance of the two codes for single-planet orbital 
fits was essentially identical. 

The next steps are to focus on good (P(x^) > 0.0027 for 
Solver A, P(x^) > 0.05 for Solver B) two-planet solutions re- 
ported by the Solvers when the simulated systems are truly com- 
posed of two planets, and investigate a) how well solvers ac- 
tually recover the orbital parameters of the planets, b) how the 
quality of multiple-planet solutions compares with that of single- 
planet fits for planets with comparable properties, and c) how 
accurately the actual value of the mutual inclination angle /i-ei is 
recovered in the case of quasi-coplanar and randomly oriented 
pairs of planetary orbits. 

3.3.2. Multiple-Keplerian orbit reconstruction 

The relative performances of Solver A's and B's algorithms in 
accurately recovering the orbital parameters in the case of two- 
planet systems are quantified using the results for the orbital pe- 
riod of the two planets. This is the most important of the orbital 
parameters, and the most critical in terms of obtaining an orbital 
solution that is close to the truth. As already noted above, we 
find that the overall performance in multiple-planet orbit recon- 
struction does not depend significantly on the relative alignment 
of the orbits, so that we present here results from the T3b exper- 
iment, i.e. the quasi-coplanar orbits case. 

The first noticeable result are the large differences in the dis- 
tributions of orbital parameters for the two Solvers. Figure [14] 
shows, compared to the true simulated ones (solid histograms), 
the recovered distributions of orbital periods of the first and sec- 
ond planet. In the upper four panels, the results for all stars (ex- 
cluding objects with only one planet, but for Solver B including 
those for which three planets are fitted) are presented for both 
Solvers. In panels five and six. Solver B's results are shown only 
for stars with good two-planet solutions, while in the last two 
panels Solver B's distributions of periods of the second and third 
planet are presented, for the sample of stars with three-planet or- 
bital solutions. 

On the one hand, for Solver A's solutions (panels 1 and 2) 
the most obvious feature observed is the fact that in a significant 
number of orbital solutions the periods are swapped (roughly 



30% of the cases, averaging over all periods), i.e. the first planet 
identified in the data is the second generated in the simulations, 
and vice-versa. This result is easily understood, as, given the 
simulation setup, the dominant signal (identified by, for exam- 
ple, a better sampled period, or a larger astrometric signature) is 
not necessarily the one of the first planet generated. Otherwise, 
Solver A's solutions appear to recover reasonably well the true 
underlying distributions. 

On the other hand, for Solver B's solutions no obvious pat- 
tern of this kind can be found. Instead, over 1/3 of the peri- 
ods identified as dominant is within 0.5 years, and no periods 
greater than 5 years are identified (panel 3). The former feature 
is in common to the solutions for the second planet (panel 4). 
When only two-planet solutions (with good post-fit P(x^)) are 
considered (panels 5 and 6), the recovered distributions still look 
largely different from those obtained by Solver A and from the 
true ones. Finally, as it appears clear by comparing panels 7 and 
8, and 5 and 6, the vast majority of short-period period orbits 
fitted for the second planet (~ 90%), and ~ 50% of those for 
the first planet, seems to be the undesired consequence of three- 
planet fits, with a correspondingly very large number of long 
periods found for the third planet. 

Such differences translate in a lower percentage of correctly 
identified two-planet systems by Solver B (even when the post- 
fit ;t^^-test is satisfactory). In fact, in Figure [T5]we show the dis- 
tributions of true periods for the first and second planet com- 
pared to the fitted distributions when the value of the period falls 
within 10% of the simulated one. In order to compare results be- 
tween the two Solvers in almost identical conditions, for Solver 
A only stars with post-fit P{x') > 0.05 are included, while for 
Solver B only two-planet solutions are considered (all having 
P(X^) > 0.05). Overall, Solver B's algorithm performs about 
40% worse than Solver A's (for the first and second planet re- 
spectively, 554 and 807 stars satisfy the above constraints for 
Solver B, while for Solver A the equivalent numbers are 993 
and 1223). This difference increases to over a factor of two if 
Solver A's P(x^) > 0.0027 criterion is adopted. The number of 
stars with both periods simultaneously satisfying the above con- 
ditions is also lower for Solver B, by some 15%. It is true that 
about 10% of the stars for which Solver B performs three-planet 
fits actually have the orbital period of the first and second planet 
fitted falling within the above-mentioned criteria, thus helping 
to somewhat reduce the observed discrepancy in performance. 
However, we will only focus on Solver A's ~ 70% of good two- 
planet orbital solutions (at the 99.73% confidence level), a total 
of 1912 and 1900 stars for the T3a and T3b tests, respectively. 
Focusing on Solver A's cleaner, and larger, sample of good or- 
bital solutions allows one to effectively undertake the compar- 
ison between the T2 and T3 tests, by using stellar samples for 
which orbital solutions have comparable quality. 

3.3.3. Comparison witli test 12 

We use orbital period and eccentricity as proxies to understand 
the behavior of the two-planet orbital solutions, and compare 
them with analogous results obtained in the T2 experiment. The 
properties of good two-planet solutions should thus be easier to 
understand. 

For the T3b case (quasi-coplanar orbits), the four panels of 
Figure [16] show, as a function of the value of the true orbital pe- 
riod, the fraction of stars with good orbital solutions for which 
the periods of both planets are recovered by Solver A with a 
fractional uncertainty AP/P < 10% (where AP is the difference 
between fitted and true period value). For comparison, analogous 
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Fig. 14. Distribution of orbital periods in the multiple-planet solutions (dashed and dashed-dotted lines), compared with the true underlying 
distributions (solid lines). Top two panels: results for planet 1 and 2 obtained by Solver A (all stars). Panels 3 and 4: the same for Solver B, 
including stars with both two and three planets found. Panels 5 and 6: the same for Solver B, but excluding stars with three planets fitted. Bottom 
two panels: the true distribution of the second planet compared with the same distributions for planet two and three obtained by Solver B in the 
sample of three-planet orbital fits. 
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results from the T2 experiment are over-plotted, after constrain- 
ing orbital periods, eccentricities, and astrometric signals to lie 
in the same ranges of the T3b experiment (P < 9 yr, e < 0.6, and 
a < 400 pias). 

Overall, the quality of the solutions degrades quickly already 
for periods > 2 years, and the fraction of systems with both or- 
bital periods recovered to within 10% of the true value is at least 
5%-10% lower than the single-planet case. For configurations in 
which both planets have P < 5 yr, a/cr^ > 10, and for which 
a number A'oss ^ 45 of observations are carried out over the 5- 
yr simulated mission lifetime (bottom right panel), the situation 



improves significantly. Over 90% of all orbital configurations 
have both periods measured to better than 10%, and the 5%-10% 
deficit with respect to the T2 experiment applies for periods in 
the range 0.2 < P < 4 yr, for both planets in the systems. A very 
similar behavior is observed (but not shown) in the T3a experi- 
ment, in which no constraints are put on the mutual incUnation 
angles. 

Formal errors from the fitting procedure appear to match the 
actual errors reasonably well. To determine more quantitatively 

how good an approximation the estimated errors are for the true 
ones, we utilize the same metric adopted in the T2 experiment. 
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Fig. 16. Fraction of systems with good orbital solutions iP(x^) > 0.0027) in the T3b experiment for which both orbital periods are 
recovered by Solver A with a fractional uncertainty < 10%, as a function of period (0.5-yrbins). For comparison, the same results 
are displayed for the T2 test. Top left: all stars. Top right: systems with both periods < 5 yr. Bottom left: systems with both periods 
< 5 yr, and with a/cr^ > 10. Bottom right: systems with both periods < 5 yr, a/cr^ > 10, and with A^oss > 45. 



i.e. the scaled difference APj/crp, (j =1,2) defined as the ratio 
between the fitted and the true value of the orbital period of the 
j-th planet and its coiTesponding formal uncertainty. We limit 
ourselves to the sample of stars for which Solver A obtains good 
solutions (99.73% confidence level), and for which orbital pe- 
riods are recovered to within 10% accuracy. Figure [TT] shows 
that, for both planets, and in both the T3a and T3b experiment, 
the distributions of scaled period differences are quite close to 
the predicted value (a Gaussian with zero mean and unit disper- 
sion). A small shift in the peak of the AP/crp distribution for the 
second planet in the T3b test might be present, but its statistical 
significance is low. Elevated tails, however, indicate that a non- 
negligible fraction of objects have underestimated periods (7% 
of the objects lie above the 3-cr, and 2% above the 5-cr threshold 
out of the scale of the plot in Figure [T7]i. 



Finally, the two panels of Figure [18] show results for the ec- 
centricities of both planets in the systems. Displayed are the frac- 
tions of systems with good orbital solutions for which the fitted 
values of e are within 0.05 of the true value, the left panel dis- 
playing results from the full sample with good orbital solutions, 
and the right after applying the above-mentioned constraints on 
periods, astrometric signal, and number of observations. Overall, 
for both planets a degradation of ~ 20% between the single- 
planet and the two-planet solutions is observed, independently 
of the actual value of e. Favorable configurations have e deter- 
mined within 0.05 of the true value about 80% of the time, with a 
degradation of ~ 10% with respect to the single-planet solutions 
of the T2 test, in line with what is found for the orbital periods. 
The modest degradation of ~ 5% - 10% in the fraction of well- 
measured periods and eccentricities with respect to the result of 
the T2 test is likely due to the increased number of parameters 
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Fig. 18. Left: Fraction of systems with good orbital solutions {P(x^) > 0.0027) in the T3b experiment for which both eccentricities 
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AP/fJp (A) 

Fig. 17. Histogram of scaled period differences AP/crp for good 
two-planet fits {P(x^) > 0.0027) with periods accurate to better 
than 10% for the T3a (green solid and dashed lines) and T3b 
(red solid and dashed lines) experiments. The dotted curves are 
reference Gaussians with zero mean and unit dispersion. 



in the two-planet fits (19 vs. 12 in the single-planet solutions), 
given the same number of observations. Other orbital parame- 
ters follow similar patterns. And again, essentially identical re- 
sults are obtained for the T3a test, demonstrating that the rela- 
tive alignment between pairs of planetary orbits does not seem 
to play a significant role in terms of the ability of Solver A's 
algorithm to reconstruct with good accuracy the orbits of both 
planets, under favorable conditions. 



3.3.4. Coplanarity measurements 

The mutual inclination /rei of two orbits is defined as the angle 
between the two orbital planes, and is given by the formula: 

cos i'l-el - COS /in COS tout + sin /in sin /out COS(nout - ^in), (1) 

where /in and /out, i^in and are the inclinations and lines 
of nodes of the inner and outer planet, respectively. The value of 
/lei is thus a trigonometric function of / and O of both planets, 
and the latter two are in turn derived as non-linear combinations 
of the four Thiele-Innes elements, which are the actual parame- 
ters fitted for in the orbital solutions. It is thus conceivable that 
any uncertainties in the determination of the linear parameters in 
the two-planet solutions might propagate in a non-trivial manner 
onto the derived value of /rei, and consequently a value of mu- 
tual inclination angle close to the truth might be more difficult to 
obtain. 

In the top two panels of Figure [19] we show the fraction of 
stars with good orbital solutions in the T3a and T3b experiments 
for which the derived value of the mutual inclination angle /lei is 

o 

determined within 10 of the true one by Solver A. The results 
are expressed as a function of /rei itself. Overall, for Solver A 
both experiments give similar results, showing that his fitting 
algorithm is only mildly sensitive to the mutual inclination of 
pairs of planetary orbits. 

In both cases. Solver A globally recovers ~ 40% of the /rei 

o 

values to within 10 uncertainty, independently of the value of 
mutual inclination. The fraction of systems for which the actual 
value of /rei is determined within the above tolerance increases 
when the constraints on well-sampled, high signal-to-noise or- 
bits, with a sufficient number of observations, are set, up to 90%. 
In the top left panel of Figure [19] both ends of the upper three 
curves are not significant, due to very low number statistics con- 
siderations. Actually, the results shown in the top right panel can 

o o 

be mapped in the top left panel (at least for 2 < /rei < 10 ), thus 
highlighting that the apparent quick degradation in the fraction 
of systems with /rei accurately determined is not real. It does nev- 
ertheless appear that, for random mutual orientation of the orbits. 
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Fig. 19. Top left: Fraction of systems in the T3a experiment with satisfactory goodness-of-fit iP(x^) > 0.0027) for which /lei is 

o 

determined to within 10 of the true value, as a function of /jei itself (10 deg bins). Top right: the same for the T3b test (1 deg bins in 
i'rei). Solid lines: all stars; dashed line: both orbital periods < 5 years; dashed-dotted line: a/cr^ > 10; long-dashed line: both orbital 
periods < 5 years and a/cr^ > 10; dotted line: both orbital periods < 5 years, a/cr^ > 10, and A^oss ^ 45. Bottom left and right: 
Same as the top two panels, this time as a function of the inclination angle of one of the two planets. 



values of i^i between 30 and 40 are slightly more likely to be 
identified correctly (by some 20%) than quasi-coplanar cases or 

o 

cases with /,ei close to 90 . For the quasi-coplanar case, perfectly 
coplanar orbits are slightly less likely to be correctly identified. 

The two lower panels of Figure [19] show similar results, but 
this time expressed as a function of the inclination angle of one 
of the two planets. Again, Solver A's results for the T3a and T3b 
sample are similar in terms of fractions of systems with i^ei cor- 

o 

rectly identified within 10 of the true value, when the various 
constraints are applied. However, the fraction of quasi-coplanar 
orbits correctly identified seems to be systematically higher, by 
up to 10%, than those with random values of /lei, except for the 

o o 

region with inclination angles in the intermediate range 30 -50 , 
in which random values of iVei, away from face-on or edge-on 



configurations, appear to be somewhat favored (by up to 20% 
more). Configurations in the T3a experiment in which one of the 
two planets is seen almost face-on appear unfavorable particu- 
larly when high signal-to-noise, well-sampled orbits are consid- 
ered. A similar, but less significant (differences up to 10%), trend 
is seen for the case of the T3b experiment (orbits viewed close 
to face-on are less likely to have measured accurately than 
quasi edge-on configurations). The responsible for such an ef- 
fect is not, however, small-number statistics. That determining 
precisely the value of i^ei for almost face-on orbits is somewhat 
more difficult should not in fact come as a surprise, as this result 
had already been discussed in our previous papers on Gaia and 
SIM multiple-planet detection and orbit determination (Sozzetti 

o 

et al. 2001, 2003b). When / — > , the uncertainty on the posi- 
tion angle of the Une of nodes grows, as eventually D becomes 
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Fig. 20. Top left and right: Same as the two upper panels of Figure [19] but for the formal uncertainties on iVei as calculated by 
propagating the formal errors on the Thiele-Innes elements from the covariance matrix of the solutions. Bottom left and right: Same 
as the two lower panels of Figure[T9] but for the formal uncertainties on /,ei calculated as for the top two panels. 



undefined for / = . If one of the two planetary orbits is close 
to face-on, but is large, then the incorrect identification of O 
is reflected in a poorer determination of irei- The effect is less se- 
vere if the two orbits are quasi-coplanar, because in this case, as 

o 

i — > for both planets, the term depending on Q. in equation[T] 
becomes very small, and ultimately an accurate knowledge of Q 
is not required. 

Finally, in Figure |20] we show the behavior of the nominal 
uncertainties on obtained by propagating the formal errors 
on the Thiele-Innes elements from the covariance matrix of the 
solutions. The results are plotted as a function of iVei (upper pan- 
els) and i of one of the two planets (lower panels). The nominal 
uncertainties appear to follow rather closely the actual errors. We 
note, however, that in several cases formal errors seem to under- 
estimate the real ones. This effect is highlighted by systemati- 
cally higher fractions of objects with low values of the nominal 



errors with respect to the real ones. This mild trend is observed 
for all values of and /, and in both experiments. 

3.4. Directions for future work 

Several complex issues have been left aside in the preliminary 
analyses carried out for all experiments of the double-blind tests 
program, such as correlations between orbital parameters and 
their errors, more thorough investigations of how well formal 
errors map the real ones, or in-depth studies of the conditions 
in which two-planet orbital fits are more likely to fail (e.g., due 
to covariance between proper motion solutions and long-period 
orbits). These topics will require rather sophisticated approaches 
and a more aggressive understanding of correlations and aliasing 
in orbital parameter space, and significantly larger sample sizes. 

Another area of potential improvement concerns the possi- 
bility to explore alternative methods for orbit fitting to improve 
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on the interpretaton of the observations and uhimately the infer- 
ences concerning the overall population of planets. One possible 
venue could be the evaluation of the applicability of Bayesian 
model selection, based on Markov chain Monte Carlo algorithms 
(e.g., Ford & Gregory 2007), to simulated Gaia data, in order to 
gauge their potential for accurate characterization of orbital pa- 
rameters and their uncertainties. 

The understanding of the technical specifications of the Gaia 
satellite and its astrometric instrument will develop further with 
time, therefore some of the simplifying assumptions in our sim- 
ulations will be progressively relaxed and a more realistic er- 
ror model (e.g., including zero-point uncertainties, calibrations 
errors, chromaticity effects, attitude error) and a realistic error 
distribution for ifj, including bias and magnitude terms, adopted. 

Finally, there is margin for adding more realism to our ref- 
erence model of planetary systems, by considering actual distri- 
butions of orbital parameters and masses, and up-to-date values 
of planetary frequencies. We will include some degree of mu- 
tual dynamical interactions in representative cases of planetary 
systems, and evaluate in detail the impact of possible sources of 
astrometric noise that might pollute and/or mimic planetary sig- 
natures (e.g., binarity of the parent star, stellar spots, and proto- 
planetary disks, whose impact can be seen in terms of additional 
dynamical perturbations as well as contamination by scattered 
light). 

4. Discussion: Gaia in context 

The striking properties revealed by the observational data on ex- 
trasolar planets (for a review, see e.g. Udry et al. 2007) reflect 
the complexities inherent in the processes of planet formation 
and evolution. The comparison between theory and observation 
has shown that several difficult problems are limiting at present 
our ability to elucidate in a unified manner all the various phases. 
Rather, one often resorts to attempt to investigate separately lim- 
ited aspects of the physics of planet formation and evolution us- 
ing a 'compartmentalized' approach. 

However, improvements are being made toward the defini- 
tion of more robust theories capable of simultaneously explain- 
ing a large range of the observed properties of extrasolar planets, 
as well as of making new, testable predictions. To this end, help 
from future data obtained with a variety of techniques will prove 
invaluable. In light of the results of the double-blind tests cam- 
paign presented in the previous sections, we focus here on the 
potential of high-precision global astrometry with Gaia, as com- 
pared to other planet detection methods, to help answer several 
outstanding questions in the science of planetary systems. 

4.1. Gaia discovery space 

We show in Figure 1211 a summary of the results presented in the 
previous sections, in terms of the minimum astrometric signa- 
ture required for detection and measurement of orbital parame- 
ters and masses with Gaia, as a function of the orbital period of 
the companion, and averaging over all other orbital parameters. 

The curves in Figure |2T] correspond, respectively, to iso- 
probability contours for 95% efficiency (virtual completeness) 
in detection at the 99.73% confidence level, 50% probability of 
measuring the companion mass to better than 15% accuracy, and 
for the same likelihood of measuring eccentricities with uncer- 
tainties lower than 0.1 and the inclination angle of the orbital 

o 

plane to better than 10 accuracy. All curves are polynomial fits 
to the actual iso-probability curves, with extrapolations for val- 



ues of P < 0.2 yr and P > 12 yr, i.e. out of the period range 
covered by our simulations. For comparison, the minimum as- 
trometric signatures (assuming sin / =1) and orbital periods of 
the present-day planet sample are overplotted. The plot, which 
closely resembles those presented in our earlier works (Lattanzi 
et al. 2000a; Sozzetti et al. 2002) indicates that Gaia would de- 
tect ~ 55% of the extrasolar planets presently known (the exact 
fraction depending on the actual value of sin /), and for > 50% 
of these it would be capable of accurately measuring orbital pa- 
rameters and actual masses. 

However, ongoing and planned surveys for planets with a 
variety of techniques are being designed to embrace the three- 
fold goal of 1) following-up and improving on the characteri- 
zation of the presently known extrasolar planet sample, 2) tar- 
geting more carefully defined and selected stellar samples, and 
3) covering new areas of the planet discovery space, with the 
ultimate expectation of eventually reaching the capability to dis- 
cover Earth-sized planets in the Habitable Zone (e.g., Kasting 
et al. 1993) of nearby stars. Indeed, by the time Gaia flies vari- 
ous other observatories will be operational, gathering additional 
information on the already known extrasolar planets sample 
and producing a wealth of new discoveries. For example, both 
ground-based as well as space-borne instrumentation for astro- 
metric planet searches is being developed, such as VLTI/PRIMA 
(Delplancke et al. 2006) and SIM PlanetQuest, with targeted 
single-measurement precision comparable to, if not higher than. 
Gala's. Then, the most effective way to proceed in order to gauge 
the relative importance of the Gaia global astrometric survey is 
not by looking at its discovery potential per se, but rather in con- 
nection with outstanding questions to be addressed and answered 
in the science of planetary systems, thus helping to discriminate 
between proposed models of planet formation and evolution. 

By doing so, one immediately realizes that Gala's most 
unique contribution will likely reside in the unbiased and com- 
plete magnitude limited census of stars of all ages, spectral types, 
and metallicity in the solar neighborhood that could be screened 
for new planets, rather than on the additional insight its mea- 
surements might give on already discovered planets. In order to 
quantify our statement, we convert the results in Figure|2T|in the 
equivalent range of companion masses and semi-major axes that 
could be detected and measured orbiting a star of given mass and 
at a distance from the Sun. For illustration, we show in Figure l22l 
Gala's discovery space in the Mp-a plane for 3cr detection (with 
95% probability) and for accurately measuring > 50% of the 
time orbital elements and masses of planets orbiting a 1-Mq star 
at 200 pc, and a O.5-M0 M dwarf at 25 pc (objects with V < 13, 
for which Gala's highest astrometric precision can be achieved). 
From the Figure, one would then conclude that Gaia could dis- 
cover and measure massive giant planets (Mp > 2 - 3 Mj) with 
1 < a < 4 AU orbiting solar-type stars as far as the nearest star- 
forming regions, as well as explore the domain of Saturn-mass 
planets with similar orbital semi-major axes around late-type 
stars within 30-40 pc. Particularly for the latter case, the Gaia 
sensitivity nicely complements at wider separations the area of 
the discovery space covered by ground-based transit photometry 
and decade-long Doppler surveys (see caption for details). 

4.1 .1 . How many planets will Gaia find? 

To better gauge the Gaia potential for planet discovery, we up- 
date the early results of Lattanzi et al. (2000b), and re-compute 
the number of possible planetary systems within Gala's grasp 
using estimates of the stellar content in the solar neighborhood 
and our present-day understanding of the giant planet frequency 



22 



S. Casertano et al.: DBT Campaign for Planet Detection with Gaia 



0) 

u 



+-> 

ti 

o 

u 

CD 



O 

u 

w 

< 



- 95%: detect (3(j) 

50%: ( 15% 

50%: (Tg ( 0.1 

1000 50%: a, ( 10° 



.'// 

// / 



100 



• • •,- • • */!/ 



•-V 



• • • • •• 



10 - 



• • • 



0.01 



0.10 1.00 
Orbital Period (yr) 



10.00 



Fig. 21. Boundaries of secure 3cr, for cr = 8 yuas) detection and accurate mass and orbital parameters determination with Gaia 
compared to the known extrasolar planets (data from http://exoplanet.eu), which are plotted for the minimum case: orbit viewed 
edge-on, true mass equals radial velocity minimum mass, and astrometric signature minimum. Lines of different shape represent 
the minimum astrometric signature for 95% probability of a 3cr detection (solid line), the minimum astrometric signature needed 
to determine at least 50% of the time the mass of a planet with better than 15% accuracy (dash-dotted line), the eccentricity with 

o 

uncertainties < 0.1 (short-dashed line), and the inclination angle with uncertainties < 10 (long-dashed line), respectively. The true 
astrometric signature, which is proportional to the true mass, will be generally higher, much higher in some cases, with the effect 
that more reliable detections and orbital fits will be possible. 



Table 2. Number of giant planets detected and measured by 
Gaia. 



Ad Aa AM,, Na iV„, 

(pc) (AU) (Mj) 

0-50 -10000 1.0-4.0 1.0- 13.0 - 1400 - 700 

50-100 -51000 1.0-4.0 1.5 - 13.0 - 2500 - 1750 

100-150 -114000 1.5 - 3.8 2.0- 13.0 - 2600 - 1300 

150-200 -295 000 1.4- 3.4 3.0- 13.0 - 2150 - 1050 



distribution fp. For the former, we use the Besancon model of 
stellar population synthesis (Bienayme et al. 1987; Robin & 
Creze 1986), constrained to V < 13, and for spectral types ear- 
lier than K5. According to this Galaxy model, we should expect 
A^* - 15 000, ~ 61 000, - 175 000, and - 470000 stars within 
radii of 50 pc, 100 pc, 150 pc, and 200 pc, respectively (see 
Figure |23]l. For fp, we take the Tabachnik & Tremaine (2002) 
approach, and use a power-law functional form to integrate a 
differential fraction within an arbitrary range of Mp and P: 

dfp = CMl,P^dMpdP (2) 

We find the normalization C by using the Tabachnik & 
Tremaine (2002) values for the exponents (J5 = -1.1, y = 
-0.73), which still provide a good description for the observed 
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Fig. 22. Gaia discovery space for planets of given mass and orbital radius compared to the present-day sensitivity of other indirect 
detection methods, namely Doppler spectroscopy and transit photometry. Red curves of different styles have the same meaning as 
in Figure |2T| assuming a 1-Mq G dwarf primary at 200 pc, while the blue curves are for a Q.5-Mq M dwarf at 25 pc. The radial 
velocity curve (pink line) is for detection at the Sctrv level, assuming ctrv = 3ms"', M* = IM©, and 10-yr survey duration. For 
transit photometry (green curve), the assumptions of Gaudi et al. (2005) are used, i.e. (Tv - 5 milli-mag, S /N - 9, M* = 1 Mq, 
Ri, - I Rq, uniform and dense (> 1000 datapoints) sampling. Black dots indicate the inventory of exoplanets as of September 2007. 
Transiting systems are shown as light-blue filled pentagons. Jupiter and Saturn are also shown as red pentagons. 



mass and period distributions of exoplanets (see for example 
Butler et al. 2006), and by imposing that the fraction of planets 
with I < Mp < 15 Mj and 2 < P < 3000 d equals the observed 
7% for F-G-K normal stars with -0.5 <[Fe/H]< 0.5 (Marcy et 
al. 2005). 

An estimate of the number of giant planets at a given dis- 
tance d (in pc) whose astrometric signal could be detected by 
Gaia with 3cr confidence 95% of the time is then given by 
Nci ~ 0.95 X fp X A^*, where A^* is computed within a sphere 
of radius d centered on the Sun for given limiting magnitude 
and spectral type, while the value of fp is calculated integrat- 
ing over a specific range of masses and periods. The number of 
planets for which, say, masses will be determined at least 50% 
of the time with an accuracy of better than 15% will instead be: 
N,„ ~ 0.50 X Nd- The results are summarized in Table |2l One 
then realizes that, based on our present knowledge of giant plan- 
ets frequencies (Mp > 1 - 3Mj), integrated over a wide range 
of spectral types and metallicities, Gaia could then find ~ 8 000 



such objects, and accurately measure masses and orbital param- 
eters for ~ 4000 of them. 



4.1 .2. How many multiple-planet systems will Gaia find? 

As of December 2007, 24 planet-bearing stars are orbited by 
more than one planet, corresponding to ~ 12% of the total sam- 
ple of RV-detected systems Q However, many systems known to 
host one exoplanet show more distant, long-period, sub-stellar 
companions with highly significant but incomplete orbits (with 
inferred semi-major axis typically beyond 5 AU). Recent anal- 
yses of these long-term trends (Wright et al. 2007) indicate that 
~ 30% of known exoplanet systems show significant evidence of 



Johnson et al. (2007) and Setiawan et al. (2008) report possible 
multiple companions around GJ 317 and HD 47536. We elect not to 
include them as their orbits are either only loosely constrained or not 
yet statistically very significant 
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multiplicity. Considering that the mass distribution of planets in- 
creases steeply toward lower masses (e.g., Marcy et al. 2005), in- 
completeness must be considerable between 1 .0 and 0. 1 Jupiter- 
masses. Thus, the actual occurrence of multiple planets among 
stars having one known planet is likely considerably greater than 
30%. 

We report in Table[3]the relevant parameters of the multiple- 
planet systems with well-measured orbits known to-date, or- 
dered by increasing distance of the system from the Sun. The 
expected values of the astrometric signature (amin) are computed 
assuming perfectly edge-on, coplanar configurations (sin ij - 1, 
for j - 1, . . . , Hp). The single-measurement precision is cr,;, = 8 
fias for all stars. Of these systems, ~ 50% have more than one 
component with amin > 3cr^, ~ 40% have components with 
Qmin > 'io'iji as Well as P < 5 - 6 yr, and some 16% have both 
ttmin > 10cr^(, as Well as P < 5 - 6 yr. Extrapolating from the 
numbers obtained in the previous Section and the ones above, 
one then infers that of the ~ 8000 new planetary systems dis- 
covered by Gaia, ~ 1000 would have multiplicity greater than 
one, and ~ 400 - 500 could have orbital parameters and masses 
measured to better than 15% - 20% accuracy. 
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Table 3. List of relevant parameters for known planetary systems. 
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M„ sin i 
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GJ 876b 


4.72 


0.32 


1.93 


0.21 


265.6 


GJ 876c 






0.56 


0.13 


48.0 


GJ 876d 






0.02 


0.02 


0.2 


GJ581b 


6.26 


0.31 


0.05 


0.04 


1.0 


GJ581C 






0.02 


0.07 


0.6 


GJ581d 






0.02 


0.25 


3.1 


HD 69830b 


12.60 


0.86 


0.03 


0.08 


0.2 


HD 69830c 






0.04 


0.19 


0.7 


HD 69830d 






0.06 


0.63 


3.4 


55 Cncb 


13.40 


1.03 


0.78 


0.11 
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f Andb 
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13.97 
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2.11 
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HD 160691b 


15.30 


1.08 


1.67 


1.50 
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HD 160691c 






3.10 


4.17 


781.1 


HD 16069 Id 
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0.09 


0.2 


HD 160691e 
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0.92 


29.1 


HD 190360c 


15.89 


1.04 


0.06 


0.13 


0.5 


HD 190360b 
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HD 128311b 
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HD 11964b 
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0.23 
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HD 217107b 


37.00 
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Table 4. Number of multiple-planet systems detected and mea 
sured by Gaia. 
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Case 



Number of Systems 



1) Detection ~ 1000 

2) Orbits and masses to 

better than 15% - 20% accuracy ~ 400 - 500 

3) Successful 

coplanarity tests ~ 150 



For some 150 systems with very favorable configurations, 
and enough redundancy in the number of observations, copla- 
narity tests could be performed, with expected uncertainties on 

o 

the mutual inclination angle of ~ 10 , or smaller In terms of 
systems for which the Gaia data alone could provide reasonably 
good orbital solutions, this is about a twenty-fold improvement 
with respect to the present-day number of systems with well- 
determined orbits, and even the number of potential systems 
for which coplanarity analysis could be successfully carried out 
compares favorably to today's sample, presently populated by 
zero objects. These numbers are summarized in TableU] Again, 
these results should be considered as lower limits, given the 
increasingly convincing evidence for a frequency of multiple- 
planet systems at least a factor of 2-3 greater than the value used 
here for the extrapolation. 

4.2. The Gaia legacy 

It is easy to realize how the statistical value of such large samples 
of newly detected giant planets and planetary systems would be 
instrumental for critical testing of planet formation and evolution 
models. To illustrate more clearly the wealth of information po- 
tentially contained in the data collected by Gaia, let us ask four 
fundamental questions for the astrophysics of planetary systems, 
and see how, based on the results presented in this paper, Gaia 
could help address them (complementing other datasets obtained 
with a variety of techniques). 

4.2.1 . How do planet properties and frequencies depend 
upon tine cliaracteristics of tine parent stars? 

Twelve years after the first discovery announcement (Mayor 
& Queloz 1995), the observational data on extrasolar planets 
are providing growing evidence that planetary systems proper- 
ties (orbital elements and mass distributions, and correlations 
amongst them) and frequencies appear to depend upon the char- 
acteristics of the parent stars (spectral type, age, metallicity, bi- 
narity/multiplicity). Doppler surveys have begun in the recent 
past to put such trends on firmer statistical grounds. For exam- 
ple, dedicated surveys of metal-rich (Fischer et al. 2005; Bouchy 
et al. 2005) and metal-poor dwarfs (Sozzetti et al. 2006; Mayor 
et al. 2003 0) are currently providing data to improve the statis- 
tical significance of the strong correlation between planet occur- 
rence rates and stellar metallicity (e.g., Gonzalez 1997; Santos et 
al. 2004; Fischer & Valenti 2005). Similarly, other groups have 
been monitoring samples of bright M dwarfs (Butler et al. 2004; 
Bonfils et al. 2005; Endl et al. 2006; Johnson et al. 2007b, and 
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Fig. 23. Stellar content to li < 200 pc, as function of the spectral 
type, for V < 13 (solid line) and V < 12 (dotted line). 



references therein), Hertzsprung gap sub-giants (Johnson et al. 
2006, 2007a), heavily evolved stars belonging to the red-giant 
branch and clump regions of the H-R diagram (Frink et al. 2002; 
Setiawan et al. 2005; Sato et al. 2003; Hatzes et al. 2005; Lovis 
& Mayor 2007; Niedzielski et al. 2007, and references therein), 
early-type dwarfs (Galland et al. 2005), and relatively young 
stars (Setiawan et al. 2007a), in order to probe the possible de- 
pendence of fp on stellar mass and age. However, the typical 
sample sizes of these surveys are of order of a few hundred ob- 
jects, sufficient to test only the most outstanding difference be- 
tween the various populations. It is thus desirable to be able to 
provide as large a database as possible of stars screened for plan- 
ets. 

As we have seen, the size of the stellar sample available for 
planet detection and measurement to the Gaia all-sky astromet- 
ric survey will be approximately a few hundred thousand rela- 
tively bright (V < 13) stars with a wide range of spectral types, 
metallicities, and ages out to ~ 200 pc. The sample-size is thus 
comparable to that of planned space-borne transit surveys, such 
as CoRot and Kepler The expected number of giant planets de- 
tected and measured (see Table |2]i could be several thousands, 
depending on actual giant planet frequencies as a function of 
spectral type and orbital distance. This number is comparable 
to the size of the combined target lists of present-day ground- 
based Doppler surveys and of future astrometric projects such as 
VLTI/PRIMA and SIM. The Gaia unbiased and complete magni- 
tude limited census of stars screened for new planets will allow, 
for example, to test the fine structure of giant planet parameters 
distributions and frequencies, and to investigate their possible 
changes as a function of stellar mass with unprecedented reso- 
lution. From Figure |23l of order of tens of thousands of normal 
stars in 0.1 Mq bins would become available for such investi- 
gations. Furthermore, the ranges of orbital parameters and giant 
planet host characteristics probed by the Gaia survey would cru- 
cially complement both transit observations (which strongly fa- 
vor short orbital periods and are subject to stringent requisites on 
favorable orbital alignment), and radial-velocity measurements 
(which can be less effectively carried out for stars covering a 
wide range of spectral types, metallicities, and ages and do not 
allow to determine either the true planet mass or the full three- 
dimensional orbital geometry). 

Thus, the ability to simultaneously and systematically deter- 
mine planetary frequency and distribution of orbital parameters 
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for the stellar mix in the solar neighborhood without any poten- 
tial biases induced by the choice of specific selection criteria for 
target lists, stems out as a fundamental contribution that Gaia 
will uniquely provide, the only limitations being those intrinsic 
to the mission, i.e., to the actual sensitivity of the Gaia measure- 
ments to planetary perturbations, which in this paper we have 
quantitatively gauged. 

4.2.2. What is the preferred method of gas giant planet 
formation? 



E 




[Fe/H] 



Fig. 24. Stellar distribution in the solar neighborhood (d < 200 
pc) as function metallicity, for V < 13 (solid line) and V < 12 
(dotted line). 

The two main competing models of giant-planet formation 
by core accretion (e.g.. Pollack et al. 1996. For a review see 
Lissauer & Stevenson 2007) and disk instability (e.g.. Boss 2001 . 
For a review see Durisen et al. 2007) make very different predic- 
tions regarding formation time-scales (Mayer et al. 2002; Alibert 
et al. 2005; Boss 2006), planet properties (Armitage et al. 2002; 
Kornet & Wolf 2006; Ida & Lin 2004a, 2005, 2008; Rice et al. 
2003b), and frequencies as a function of host star characteris- 
tics (Laughlin et al. 2004; Ida & Lin 2004b, 2005, 2008; Kornet 
et al. 2005, 2006; Rice et al. 2003b; Boss 2000, 2002, 2006). 
Furthermore, correlations between orbital elements and masses, 
and possibly between the former and some of the host star char- 
acteristics (metallicity, mass) might reflect the outcome of a vari- 
ety of migration processes and their possible dependence on en- 
vironment (Livio & Pringle 2003; Ida & Lin 2004a, 2008; Boss 
2005; Burkert & Ida 2007). Some of these predictions could be 
tested on firm statistical grounds by extending planet surveys to 
large samples of stars that are not readily accessible to Doppler 
surveys. 

For example. Galaxy models (Bienayme et al. 1987; Robin 
& Creze 1986) predict ~ 4000 F-G-K dwarfs and sub-dwarfs 
to 200 pc, brighter than V - 13 mag, and with metallicity 
[Fe/H] < -1.0 (see Figure l24b. The entire population will be 
screened by Gaia for giant planets on wide orbits thus comple- 
menting the shorter-period ground-based spectroscopic surveys 
(e.g., Sozzetti et al. 2006), which are also limited in the sample 
sizes due to the intrinsic faintness and weakness of the spectral 
lines of the targets. These data combined would allow for im- 
proved understanding of the behavior of the probability of planet 
formation in the low-metallicity regime, by direct comparison 



between large samples of metal-poor and metal-rich stars, in turn 
putting stringent constraints on the proposed planet formation 
models and helping to better the role of stellar metallicity in the 
migration scenarios for gas giant planets. 

Table 5. The closest (< 200 pc) star forming regions and young 
stellar kinematic groups. 



Name 


Distance (pc) 


Age (Myr) 


Hercules-Lyra 


15-40 


100 


AB Doradus 


20-50 


30-50 


Subgroup B4 


20-50 


80-100 


/? Pictoris 


30-50 


8-15 


Tucana-Horologium 


ou-ou 


O-JU 


TWHya 


50 


3-50 


MBM 12 


60-110 


3-10 


Tj Chamaeleontis 


90-150 


8-10 


Tj Cariuce 


100 


8 


MBM 20 


110-160 


3-10 


Pleaides 


125 


75-100 


g Ophiuchi 


125-150 


1-2 


Taurus-Auriga 


135 


1-2 


Corona Austrina 


140 


1-2 


Lupus 


140 


1-2 


o Velorum 


160 


30 


& Carinffi 


160 


30 


Scorpio-Centaurus 


160-180 


2-20 


a Persei 


175 


85 


Serpens 


200 


5-10 



Furthermore, within the useful (for Gaia) distance horizon 
of ~ 200 pc, hundreds of relatively bright (V < 13 - 14) young 
stars can be found in some twenty or so nearby star-forming re- 
gions and young associations (see Table |5] for a list of young 
associations, open clusters, and moving groups in the age range 
~ 1-100 Myr in the solar neighborhood, with ages in the ap- 
proximate range 1 - 1 00 Myr The data, ordered by increasing dis- 
tance from the Sun, are from Zuckermann & Song (2004, and 
references therein) and Lopez-Santiago et al. (2006, and refer- 
ences therein)). All these stars will be observed by Gaia with 
enough astrometric precision to detect the presence of massive 
giant planets (M,, > 2 Mj) orbiting at 2-4 AU. The possibil- 
ity to determine the epoch of giant planet formation in the pro- 
toplanetary disk would provide the definitive observational test 
to distinguish between the proposed theoretical models. These 
data would uniquely complement near- and mid-infrared imag- 
ing surveys (e.g.. Burrows 2005, and references therein) for di- 
rect detection of young, bright, wide-separation (a > 30-100 
AU) giant planets, such as JWST. 

4.2.3. How do dynamical interactions affect the architecture 
of planetary systems? 

The highly eccentric orbits of planetary systems have been ex- 
plained so far calling into question a variety of dynamical mech- 
anisms, such as interactions between a planet and the gaseous 
disk, planet-planet resonant interactions, close encounters be- 
tween planets, secular interactions with a companion star (see 
for example Ford & Rasio 2007, and references therein). Some 
of these eccentricity excitation mechanisms can give rise to 
very different orbital architectures, including significantly non- 
coplanar orbits (Thommes & Lissauer 2003). An effective way to 
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understand their relative roles would involve measuring the mu- 
tual inclination angle between pairs of planetary orbits. Studies 
addressing the long-term dynamical stability issue for multiple- 
planet systems (presently divided in three broad classes of hi- 
erarchical, secularly interacting and resonantly interacting sys- 
tems. See for example Kiseleva-Eggleton et al. 2002; Ji et al. 
2003 and references therein; Correia et al. 2005; Barnes & Quinn 
2004; Gozdziewski & Konacki 2004 and references therein), 
as well as the possibility of formation and survival of terres- 
trial planets in the Habitable Zone of the parent star (Menou & 
Tabachnik 2003; Jones et al. 2005 and references therein), would 
also greatly benefit from knowledge of the mutual inclination an- 
gle between planetary orbits. 

The only way to provide meaningful estimates of the full 
three-dimensional geometry of any planetary system (without 
restrictions on the orbital alignment with respect to the line of 
sight) is through direct estimates of the mutual inclinations an- 
gles using high-precision astrometry. We have shown here how, 
extrapolating from today's knowledge of the frequency and ar- 
chitectures of multiple-planet systems, Gaia could detect and 
measure several hundred such systems, and perform a significant 
coplanarity analysis in a few hundred cases (see Table|4]i. These 
data, combined with those available from Doppler measure- 
ments and transit photometry and transit timing (e.g., Miralda 
Escude 2002; Holman & Murray 2005; Agol et al. 2005), would 
then allow to put studies of the dynamical evolution of planetary 
systems on firmer grounds. 

4.2.4. What are the phase functions and light curves of gas 
giant planets? 

The combination of high-cadence, milli-mag photometric and 
1-5 m s"^ precision radial-velocity measurements of transit- 
ing planet systems provides the fundamental observational data 
(planetary mass, radius, density, and gravity) needed for a mean- 
ingful comparison with structural models of hot Jupiters (e.g.. 
Burrows 2005, and references therein). The special geometry 
of a transiting planet also permits a number of follow-up stud- 
ies, which in particular have enabled direct observation of their 
transmission spectra and emitted radiation (Charbonneau et al. 
2007, and references therein). These data provide the first obser- 
vational constraints on atmospheric models of these extrasolar 
gas giants (Burrows 2005, and references therein). 

The next logical step, the direct detection of extrasolar gi- 
ant planets using high-contrast imaging instruments, requires 
that their dim light be separated from under the glare of their 
bright parent stars. Several theoretical studies (Hubbard et al. 
2002; Baraffe et al. 2003; Sudarsky et al. 2005; Dyudina et al. 
2005; Burrows et al. 2004, 2007) have discussed exoplanet ap- 
parent brightness in reflected host star light (expressed in units 
of the planet/host star flux ratio log(Fpi/Fstar) as functions of 
orbit geometry, orbital phase, cloud cover, cloud composition, 
mass and age. In particular, orbit and orientation of an extraso- 
lar planet play a crucially important role in its flux at the Earth 
and in its interpretation, with strong dependence on eccentric- 
ity and inclination (Burrows et al. 2004). Depending upon e and 
i, log(Fpi/Fstar) can be essentially constant (in case of e ^ 0.0, 

o 

! ^ , for example), or vary by over an order of magnitude (in 

o 

case of e ^ 0.6, / - 90 for example) along the orbit of an ex- 
oplanet, and this can induce significant changes in the chemical 
composition of its atmosphere (e.g., from cloudy to cloud-free). 
As for the knowledge of the actual mass of the planet, particu- 
larly at young ages theory predicts changes in intrinsic luminos- 



ity by a factor of nearly 100 can occur between objects in the 
mass range 1 Mj < Mp < 5 My. The few wide-separation sub- 
stellar companions detected to-date by means of direct imag- 
ing techniques (Chauvin et al. 2005a, 2005b; Neuhauser et al. 
2005; Biller et al. 2006), have planetary-mass solutions within 
their error bars, but these mass estimates rely upon so far poorly 
calibrated theoretical mass-luminosity relationships, thus their 
actual nature (planets or brown dwarfs) remains highly uncer- 
tain. It is then clear how accurate knowledge of all orbital pa- 
rameters and actual mass are essential for understanding the 
thermophysical conditions on a planet and determining its vis- 
ibility. Recently, the first prediction of epoch and location of 
maximum brightness was derived for the giant planet orbiting 
e Eridani using HST/FGS astrometry in combination with high- 
precision radial-velocities (Benedict et al. 2006). As of today, 
there are some 20 RV-detected exoplanets with Mp sin / > 1 My, 
P > 1 yr and projected separations > 0.1 arcsec (the typical 
size of the Inner Working Angle of coronagraphic instruments 
presently under study) for which Gaia could provide informa- 
tion on where and when to observe, and presumably several tens 
more will be discovered in the next several years by Doppler 
surveys and by Gaia itself. Gala's ability to accurately measure 
orbital parameters (including inclination) and actual mass of a 
planet through high-precision astrometric measurements would 
then provide important supplementary data to aid in the interpre- 
tation of direct detections of exoplanets. 

4.2.5. How common are the terrestrial planets? 

With the advent of the new generation of ultra-high precision 
spectrographs such as HARPS (e.g., Pepe et al. 2004), radial- 
velocity programs achieving < 1ms"' measurement precision 
have begun detecting around nearby M dwarfs close-in planets 
with Mp sin / ^ 5 - 10 Me (Rivera et al. 2005; Lovis et al. 2006; 
Udry et al. 2007), so-called 'super-Earths', likely to be mostly 
'rocky' in composition. One of them, GJ 58 Id (Udry et al. 2007), 
may orbit within the Habitable Zone of the parent star, depend- 
ing on the assumed exoplanet atmosphere (Selsis et al. 2007; von 
Bloh et al. 2007). The announcement of the discovery of a short- 
period habitable terrestrial planet around a low-mass star might 
well be just around the corner. However, the strongest statistical 
constraints (including bona-fide detections) on the frequency of 
Earth-sized habitable planets orbiting Sun-like stars will likely 
come from currently operating and upcoming space-borne ob- 
servatories devoted to ultra-high precision transit photometry, 
such as CoRot (Baglin et al. 2002) and Kepler (Borucki et al. 
2003), and very high-precision narrow-angle astrometry, such as 
SIM (Beichman et al., 2007, and references therein). 

The next challenging step will be to directly detect and char- 
acterize terrestrial, habitable planets orbiting stars very close 
(d < 25 pc) to our Sun, searching for elements in their at- 
mospheres that can be interpreted as 'bio-markers' (Hitchcok 
& Lovelock 1967; Des Mai-ais et al. 2002; Seager et al. 2005; 
Tinetti et al. 2007; Kaltenegger et al. 2007), implying the likely 
existence of a complex biology on the surface. Imaging terres- 
trial planets is presently the primary science goal of the coro- 
nagraphic and interferometric configurations of the Terrestrial 
Planet Finder (TPF-C & TPF-I) and Darwin missions (Beichman 
et al. 2007, and references therein). Ultimately, the final list of 
targets will be formulated taking into account constraints com- 
ing from the knowledge of fp in the terrestrial mass regime, 
potential stellar host characteristics (spectral type, binarity, sur- 
face activity), and environment. In this respect, Gaia astrometry 
of all nearby stars, including the large numbers of M dwarfs. 
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within 25 pc from the Sun will be an essential ingredient in or- 
der to provide Darwin/TPF with a comprehensive database of 
F-G-K-M stars with and without detected giant planets orbit- 
ing out to several AUs from which to choose additional targets 
based on the presence or absence of Jupiter signposts (Sozzetti et 
al. 2003b). Such measurements will uniquely complement ongo- 
ing and plaimed radial- velocity programs and exo-zodiacal dust 
emission observations from the ground with Keck-I, LBTl, and 

vm. 



5. Summary and conclusions 

We have presented results from an extensive program of double- 
blind tests for planet detection and measurement with Gaia. The 
main findings obtained in this study include: a) an improved, 
more reaUstic assessment of the detectabUity and measurabil- 
ity of single and multiple planets under a variety of conditions, 
parametrized by the sensitivity of Gaia, and b) an assessment of 
the impact of Gaia in critical areas of planet research, in depen- 
dence on its expected capabilities. 

Overall, the results of our earlier works (Lattanzi et al. 
2000a; Sozzetti et al. 2001, 2003a) are essentially confirmed, 
with a fundamental improvement due to the successful devel- 
opment of independent orbital fitting algorithms applicable to 
real-life data that do not utiUze any a priori knowledge of the or- 
bital parameters of the planets. In particular, the results of the Tl 
test (planet detection) indicate that planets down to astrometric 
signatures or ^ 25 fxas, corresponding to ~ 3 times the assumed 
single-measurement error, can be detected reliably and consis- 
tently, with a very small number of false positives (depending 
on the specific choice of the threshold for detection). 

The results of the T2 test (single-planet orbital solutions) in- 
dicate that: 1) orbital periods can be retrieved with very good ac- 
curacy (better than 10%) and small bias in the range 0.3 < P < 6 
yrs, and in this period range the other orbital parameters and 
the planet mass are similarly well estimated. The quality of the 
solutions degrades quickly for periods longer than the mission 
duration, and in particularly the fitted value of P is systemat- 
ically underestimated; 2) uncertainties in orbit parameters are 
well understood; 3) nominal uncertainties obtained from the fit- 
ting procedure are a good estimate of the actual errors in the or- 
bit reconstruction. Modest discrepancies between estimated and 
actual errors arise only for planets with extremely good signal 
(errors are overestimated) and for planets with very long period 
(errors are underestimated); such discrepancies are of interest 
mainly for a detailed numerical analysis, but they do not touch 
significantly the assessment of Gala's abiUty to find planets and 
our preparedness for the analysis of perturbation data. 

The results of the T3 test (multiple -planet orbital solutions) 
indicate that 1) over 70% of the simulated orbits under the 
conditions of the T3 test (for every two-planet system, periods 
shorter than 9 years and differing by at least a factor of two, 
2 < a/cr^ < 50, e < 0.6) are correctly identified; 2) favorable 
orbital configurations (both planet with periods < 4 yr and as- 
trometric signal-to-noise ratio a/cr^ > 10, redundancy of over a 
factor of 2 in the number of observations) have periods measured 
to better than 10% accuracy > 90% of the time, and compara- 
ble results hold for other orbital elements; 3) for these favorable 
cases, only a modest degradation of up to 10% in the fraction 
of well-measured orbits is observed with respect to single-planet 
solutions with comparable properties; 4) the overall results are 
mostly insensitive to the mutual inclination of pairs of planetary 
orbits; 5) over 80% of the favorable configurations have iVei mea- 



sured to better than 10 accuracy, with only mild dependencies 
on its actual value, or on the inclination angle with respect to the 
Une of sight of the planets; 6) error estimates are generally ac- 
curate, particularly for fitted parameters, while modest discrep- 
ancies (errors are systematically underestimated) arise between 
formal and actual errors on iVei. 

Then, we attempted to put Gala's potential for planet detec- 
tion and measurement in context, by identifying several areas of 
planetary science in which Gaia can be expected, on the basis 
of our results, to have a dominant impact, and by delineating 
a number of recommended research programs that can be con- 
ducted successfully by the mission as planned. In conclusion. 
Gala's main strength continues to be the unbiased and complete 
magnitude limited census of stars of all ages, spectral types, and 
metallicity in the solar neighborhood that will be screened for 
new planets, which translates into the abiUty to measure actual 
masses and orbital parameters for possibly thousands of plane- 
tary systems. 

The Gaia data have the potential to a) significantly refine 
our understanding of the statistical properties of extrasolar plan- 
ets: the predicted database of several thousand extrasolar planets 
with well-measured properties will allow for example to test the 
fine structure of giant planet parameters distributions and fre- 
quencies, and to investigate their possible changes as a function 
of stellar mass with unprecedented resolution; b) help crucially 
test theoretical models of gas giant planet formation and migra- 
tion: for example, specific predictions on formation time-scales 
and the role of varying metal content in the protoplanetary disk 
will be probed with unprecedented statistics thanks to the thou- 
sands of metal-poor stars and hundreds of young stars screened 
for giant planets out to a few AUs ; c) improve our comprehen- 
sion of the role of dynamical interactions in the early as well 
as long-term evolution of planetary systems: for example, the 
measurement of orbital parameters for hundreds of multiple- 
planet systems, including meaningful coplanarity tests will al- 
low to discriminate between various proposed mechanisms for 
eccentricity excitation; d) aid in the understanding of direct de- 
tections of giant extrasolar planets: for example, actual mass es- 
timates and full orbital geometry determination (including incli- 
nation angles) for suitable systems will inform direct imaging 
surveys about where and when to point, in order to estimate op- 
timal visibihty, and will help in the modeling and interpretation 
of giant giant planets' phase functions and light curves; e) pro- 
vide important supplementary data for the optimization of the 
selection of targets for Darwin/TPF: for example, all F-G-K-M 
stars within the useful volume (~ 25 pc) will be screened for 
Jupiter- and Saturn-sized planets out to several AUs, and these 
data will help to probe the long-term dynamical stabiUty of their 
Habitable Zones, where terrestrial planets may have formed, and 
maybe found. 

We conclude by providing a word of caution, in hght of the 
possible degradations in the expected Gaia astrometric precision 
on bright stars (V < 13). Indeed, refinements in the overall Gaia 
error model (which includes centroiding as well as systematic 
uncertainties due to a variety of calibration errors) are still pos- 
sible, and a better understanding of some of the many effects that 
need to be taken into account may help reduce the present-day 
end-of-mission scientific contingency margin of ~ 20% which 
is included to account for discrepancies that may occur between 
the simplified error-budget assessment performed now and the 
true performances on real data. However, if ultimately a degrada- 
tion of 35% -40% in the single-measurement precision on bright 
stars were to be confirmed, the Gaia science case for exoplanets 
would be affected to some degree of relevance. For example, by 
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Table 6. Number of single- and multiple-planet systems detected 
and measured by Gaia as a function of cr^. 
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" Single-measurement precision 

* Number of stars within the useful distance, assumed to scale with 
the cube of the radius (in pc) of a sphere centered around the Sun 
Number of single-planet systems detected 

'' Number of single-planet systems whose astrometric orbits are mea- 
sured to better than 15% accuracy 

Number of multiple-planet systems detected 

^ Number of multiple-planet systems with orbits measured to better 
than 15%-20% accuracy 

^ number of multiple-planet systems for which successful coplanarity 
tests (with iVei known to better than 10 accuracy) can be carried out. 

simply scaling with the value of the astrometric signal needed 
for detection and measurement of the orbital parameters to 15%- 
20% {ajcr^ ~ 3 - 5, see Figure 12111. as cr^ increases the same 
type of system (same stellar mass, same planet mass, same or- 
bital period) would be characterized at increasingly shorter dis- 
tances. A comparison between numbers of detectable and mea- 
surable single- and multiple-planet systems as a function of in- 
creasing Gaia single-measurement error is presented in Table |6] 
Assuming that the number of objects scales with the cube of the 
radius (in pc) of a sphere centered around the Sun (with no dis- 
tinction of spectral types), if cr^ degrades from 8 yuas to 12 yuas 
(closer to the present-day estimate) then this would correspond 
to a reduction of a factor ~ 2 in the distance limit and in a cor- 
responding decrease in the number of stars available for investi- 
gation from ~ 5 X 10^ to ~ 1.5 x 10^. If cr^^ were to worsen by 
a factor 2, the number of stars available for planet detection and 
measurement (~ 6 x lO'*) would be reduced by about an order of 
magnitude. Accordingly, the expected numbers of giant planets 
detected and measured would decrease from ~ 4000 to ~ 1200 
and ~ 500, respectively, and the number of multiple systems 
for which coplanarity could be established would diminish from 
~ 160 to ~ 50 and ~ 20, respectively. We conclude that a factor 2 
degradation in astrometric precision would severely impact most 
of Gaia exoplanet science case. We are aware that, instead of us- 
ing simple scaling laws, one should provide more quantitative 
statements based on new simulations. However, this activity will 
necessarily be tied to further developments of the understand- 
ing of the technical specifications of Gaia and its instruments, 
and of its observation and data analysis process; therefore, we 
plan to revisit these issues as needed in the future, depending on 
the actual evolution of the knowledge of the Gaia measurement 
process. 
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Appendix A: The Simulated IVIodel 

The code for the generation of synthetic Gaia observations of planetary systems is run by the Simulators group. 

We start by generating spheres of targets. Each target's two-dimensional position is described in the ecliptic reference frame 
via a set of two coordinates Ai, and called here barycentric coordinates. We linearly update the barycentric position as a function 
of time, accounting for the (secular) effects of proper motion (two components, fi^ and /z^), the (periodic) effect of the parallax 
n, and the (Keplerian) gravitational perturbations induced on the parent star by one or more orbiting planets (mutual interactions 
between planets are presently not taken into account). The model of motion can thus be expressed as follows: 



Xecl 



/ 1 eel 



(A.l) 



Where: 



x° - 

^eel ~ 



COS /3h COS Ah \ 

COS f5h sin Ab 
sin /3b 



is the initial position vector of the system barycenter. The various perturbative effects are initially defined in the tangent plane. The 
parallax and proper motion terms are contribute as: 



jlpt + TlFp 





Where the parallax factors are defined utilizing the classic formulation by Green (1985): 

= - sin(/ifo - /Iq) 
Fp ^ - sin j3t sin( Ah - Aq) 

and Aq is the sun's longitude at the given time t. The term describing the Keplerian motion of the j-th planet in the tangent plane is: 









' Qj COS )7j ' 


XKj = 






Qj sin )?j 




I ) 




I ) 



where is the separation and i9j the position angle. The two coordinates Xkj and yKj are functions of the 7 orbital elements: 

Xkj = flj(l - COS £'j)(cos(vj + (jjj) COS Qj - sin(vj + wj) sin cos /j) (A. 2) 



Ykj = flj(l - cos £'j)(cos(vj + <jjj) sin Qj + sin(vj + wj) cos Qj cos /j), 



(A.3) 



where /j is the inclination of the orbital plane, a>j is the longitude of the pericenter, Qj is the position angle of the line of nodes, ej 
is the eccentricity, oj is the apparent semi-major axis of the star's orbit around the system barycenter, i.e. the astwmetric signature. 
For what concerns Ej, the eccentric anomaly, is the solution to Kepler's Equation: 



£■] - gj sin = Mj , 



with the mean anomaly Mj, expressed in terms of the orbital period Pj and the epoch of the pericenter passage tj: 
2n 



(A.4) 



(A.5) 



Finally, the true anomaly vj is a function of the eccentricity and the eccentric anomaly: 



2 arctan < 



'1 



1 -ej 



1/2 



tan£'i/2 



(A.6) 



We then rotate on the ecliptic reference frame by means of the transformation matrix: 

' - sin Ab - sin /3b cos Ab cos f^b cos Ab \ 
R(Ab,/3b) - cos Ab -sinPbsin Ah cosfihsin Ah 
cosySfc sin /3h 



The other two vectors in Eg. |A. H are thus defined as 

x^j = R(Ab,/3b) -XKj 

- sin AbQi cos )?j - sin f^b cos AbQ^ sin -d-j 
cos AbQi COS )9j - sin /3h sin AhQj sin §j 
cos f^b0j sin -d-j 
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' - sin A-hKl^kt + T^Fx\ - sinySfc cos Ai,[jjj}t + nF^] ^ 
cos Ahiii^t + ttF^i] - sinySi, sin Ahifipt + nF/j} 
cos jPtifJ^fjt +nF/}] 

This allows us to write Eq. lA.ll in the form: 

' cos /3h cos Al, - sin AbPj cos )9j - sin Pi, cos Ah Y!]^ i Q'i sin i?] 
- sin Ab[n,it + 7:F,x] - sin/?/, cos Ah{^pt + nFp] 
cos Ph sin Ah + cos Ah cos i9j - sin yS/, sin Ah Y!]'L\ Qi sin i9j 

+ cos Ab{iixt + ttF^) - sin/?/, sinAhi^ipt + ttF/;) 
i/ny6/, + cosySi Y^'jLi Qi sin i?j + cosyS^lyu^r + tt/tj} 

Finally, a rotation to the local reference frame defined by the Instantaneous Great Circles is made by means of the transformation 
matrix (e.g., ESA 1997): 

(A.7) 
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^ Zecl ^ 



XlGC = MAp,l3p) ■ Xecl, 

where: 

/ - sin An 



R(Ap,Pp) 



- sin Pp cos Ap 
cos /3p cos Ap 



cos zip 

- sin Pp sin /Ip 
cos Pp sin zip 





cos Pp 
sin Pp 



and /}^, ySp are the coordinates of the pole of the IGC at any given time. The resulting vector can be expressed in terms of the two 
angular coordinates i// and rj: 









' COS COS rj ' 


XlGC = 


yiGC 




cos T] sin iff 




V ZiGC ) 




sin T] 



By now expanding in Taylor Series to first order the IGC cartesian position vector of each target, it is possible to derive a set of 
linearized equations of condition expressing only the observed abscissa as a function of all astrometric parameters and orbital 
elements. We formally have: 



5XiGC = ^ 



5xiGC 



(A.8) 



The n unknowns a„, represent positions, proper motions, parallax, and the 7 ★ rip orbital elements (if the star is not single). Now 
consider that: 

5xiGc = ^(xiGcyiGcZiGc) = (^(cosi/rcos/7),5(sini/'cos77),5sin/7) 

= (- sin i/r cos rjdt// - sin rj cos t/zdj], cos iff cos rjdi// - sin t] sin ifrdj], cos Tjdrj) 

- (- sini/rCOST/dl/r, COSl/rCOST/dl/r, 0) 

+(- sin cos if/dr], - sin 77 sin ij/dri, cos Tjdrf) 
= cos T]{- sin ^, cos tfr, Q)difr + (- sin 77 cos tfr, - sin sin iff, cos rfjdi] 

- cos rjdtffe^ + dr/C;, 

where e,j and constitute the pair of orthogonal unit vectors in the directions parallel to iff and 77, as defined in the tangent plane. 
We then have: 



cos r]diffe^ + dz/e,, = V ^^^da„ 



By taking the scalar product with e^f,, we obtain the following scalar expression: 

<5xiGC , , V' ^yiGC. 



cos Tjdtff = (- sin iff) V -^^da^ + (cos iff) V 
oa„. ^ 



da„ 



-da„ 



(A.9) 



(A. 10) 



If we now define: 

Ca.„ = (-smi/r)— + (cOSl/r)— , 

oa„, oa,n 
then the linearized condition equation takes the form: 

n 

cosr]diff = '^Ca,,da„ = F(/i,j6, jU^, yU/j, tt, oj, fj, tj, wj, Qj, ej, /j), j ^ l,...,np 

m=l 

For each given target, there will be as many equations of this form as the number of observation epochs. The quantity diff - if/obs-^cat 
is defined as the difference between the observed and catalog abscissa. 
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